Encryption is the art of making a message or other data safe for delivery over a medium where it may be intercepted by an unintended party. I say art because at least it often seems that way. It can be a very complex topic, made moreso by an often confusing array of (dis)information from mainstream media, technical media, software/hardware vendors, (quasi-)security experts, and white/gray/black hat crackers. I encourage developers who are attempting to understand encyption enough to implement systems which use it to read as much as they can on the topic, but with an extremely critical eye. I will attempt to share some information from my own experiences here, in a very small series of articles on the topic.
Today, I will cover some broad theory and general dos and don'ts. In the near future, I will share some more detailed information regarding actual implementation in code. Finally, I will share a library I developed for .NET developers to encapsulate some encryption code which does not always behave as one might intuitively expect.
What is Encryption? What is it Not?
Encryption involves four specific items:
- Plaintext Data is the data you are protecting. 'Plaintext' is the term frequently used, but this does not literally need to be text.
- A Cypher Algorithm is the method you are using to protect the data. You can think of this as the literal code implementation.
- A Key is the 'secret ingredient' which is used with the cypher and the plaintext. Just as with a lock on a door, one must possess the key - or be able to fake the key - to lock or unlock what is being protected. Sometimes a key actually has multiple parts, such as in asymmetric algorithms, which have 'public' and 'private' keys.
- Cypher Text is the data after the key and cypher have been applied, and is the information which is transmitted over the 'public' means, such as the Internet. Again, use of the word 'text' is not meant to limit this only to being literal text. The ideal result of encryption is that anyone at all can get hold of your cypher text, and they should not be able to figure out what the original message was.
Is 'Perfect Secrecy' Possible?
One-Time Pad
The name literally comes from the fact that when it was originally conceived of and used, the key would be written on two pads
of paper, one for each conversant. As the key data one one page was used up, it was to be torn off and discarded. Any re-use
of the key data would make it fairly easy to crack the code. So, the pad could only be used one time. The messages, therefore,
had to be extremely short.
It is said that at one time long ago, Russian spies used the one-time pad to encrypt messages. However, their code was broken by the
NSA. Why? Because they began to re-use the key data.
One-Time Pad FAQ
Practically? No. Literally? Yes. In fact, in the early 20th century, an algorithm was created which is recognized as achieving perfect secrecy. Called a "One-Time Pad", this method generates cypher text for which it is impossible to mathematically deduce the plaintext, when used 100% correctly. However, this algorithm has a requirement that effectively makes it impossible to use correctly in the real world; for each volume of plain text being encrypted, an equal volume of key text is required, and it must constantly be generated in a truly random manner. Any reuse of key data, and any predictability in the generated keys, and it can be almost trivial to break cypher text protected with a One-Time Pad. Attempts have been made to remove the 'One-Time' nature of this cypher, but the extreme simplicity of this cypher dooms any such attempts to failure.
Most encryption algorithms, then, use the particular nature of incredibly complex mathematical calculations to allow them to reuse key data in relative safety. I won't pretend to understand the basic reasons behind why they work, but using the complex nature of prime numbers and cryptographically secure random data generation techniques, a good cypher produces cypher text which would take extremely long periods of time for large networks of fast computers to analyze - but it is always at least theoretically possible to do. As technology progresses, and chips get faster and faster, it becomes easier to break yesterday's cyphers.
One defense against that is to increase the size of the keys being used; the more random key data is used, the more difficult the cracking process is. Also, cyphers will usually have some very subtle flaw which eventually is discovered, and can then be used to greatly speed up cracking cypher text - sometimes to the point where it becomes trivial. Because of that, new cyphers and new versions of existing cyphers are always being worked on.
Secret Keys Must be Kept Secret
This is not only a requirement with the One-Time Pad; the secret key data used in all encryption methods must be kept secret. A key must be generated to encrypt/decrypt the data. That key will have to be stored somewhere, and it will have to be sent to someone else. It serves no purpose to use a high-security encryption method, but then send the secret key to decrypt your messages in clear text over the Internet; anyone who intercepts that key can then use it to read your messages. So, the key itself must be protected when transported. But, does that not lead to a catch-22; You cannot encrypt something until your recepient has the key to decrypt, but you have to encrypt the key itself to send it to them, but you can't do that... This problem was solved with Public Key Encryption (link), where a generated private key comes with a public key, and the two work as a pair to enable secure transmission of encryption keys themselves. More on this later.
Once you are in possession of a key then, it must be kept safe for the duration of its use. This was the biggest problem with the One-Time Pad; huge amounts of secret key data had to be stored safely. Contrast this with the situation today with most cyphers, where a high-quality key can be stored in just a few kilobytes. But it's still important that this data be kept absolutely secure. For this reason, most systems will not use any particular key beyond a single session of communication, such as one browsing session on Amazon. When you first connect, a random key is generated, and transmitted using Public Key encryption. Under most circumstances, the keys being used are never stored on disk anywhere. However, many types of transmission do require keys to be stored, particularly when data is not being transferred in a real-time session, but rather in a disconnected manner. For example, if you want to be able to send encrypted e-mails to a business associate so that they can only be read by them, you must store the key being used to do this. Real-time generation of the key is not possible.
So, it is important that keys not be stored on disk when they need not be. But, when they must be stored, it is important to do so safely.
What Are You Really Encrypting?
In some environments where you are not in control of the data at all points, consider exactly what it is you are encrypting, and why. As an example; some people believe that because they log on to their e-mail system via a secure web site or their e-mail client uses SSL/TLS to connect to the server, that their e-mail is therefore secure. Generally, such systems are only in place to protect the user's password from being discovered. Unless the content of the e-mail was itself encrypted by the sender before it left their computer, the e-mail was transmitted 'in the clear' over the Internet right up until the time it reached the user's server. For full security and privacy of all aspects of e-mail, the messages themselves must be encrypted, in addition to the log on to the e-mail servers.
'Random' is Not Always Random
A key must always be truly random. Some network systems which use encryption in the past have actually used a user's password to generate the encryption key. To put it mildly, this is a Bad Thing™. Even an otherwise decent user password can contain elements of predictability which can be identified by a cracking process and used to greatly speed up cracking any message encrypted using that key. One must remember that encryption is not broken by a person sitting and looking at streams of bytes, but by computers pre-programmed to look for patterns which a human would never recognize, and to try various tests against them to see if they can assist in the process of cracking the message or key.
But beyond a user's password being used, most programming languages has built-in 'random' functions which should never be used to generate encryption keys. I recently commented on the SANS/CWE list of 2009's Top 25 Most Dangerous Programming Errors (link) in a couple articles, including one where I speak more on this topic. (link) I recommend reading that entry. Short version; Use only cryptographically-secure random number generators for encryption key generation. Such generators take much longer to generate their random values than the functions meant to pick a random banner to display, choose how to shuffle the desk for Solitaire, or decide whether an orc swinging hits your head, kneecap, or misses entirely - but they are the only safe way to generate encryption keys.
What are Symmetric and Asymmetric Encryption?
Symmetric encryption uses the same key to encrypt and decrypt the data. Both sides must have the same key therefore, obviously. But this brings up the problem of how do you transmit that key privately? Asymmetric encryption uses a private key and then a public key which is based on the private key. The public key truly can be 'public'. Anything encrypted with the public key can only be decrypted with the private key which created it. So when person A sends their public key to person B, person B can encrypt data with that key and send it out. They will then know that only person A will be able to decrypt that message.
Asymmetric encryption is considerably more complex than symmetric. This applies to the asymmetric cyphers themselves, the generation of keys which will be used in it, the time it takes these cyphers to encrypt/decrypt messages and finally, for programmers to actually use. A fairly common mistake (overshadowed only by the mistake noted below) is to overuse asymmetric encryption where it is not meant to be used. Generally speaking, you should not use asymmetric encryption to actually encrypt a message or a stream of data. Instead, asymmetric encryption is usually used only to encrypt a symmetric key for use in the actual data exchange. (more on this below)
Encryption is Not the Safest Way to Verify Passwords
That is, reversible encryption is not. Hash Algorithms are a special type of encryption which is meant to be only one-way; The cypher text from a hash algorithm should not ever be able to use used to determine the plain text, except by comparison to known plain text values also hashed in the same way - but that's part of the design. Some hash algorithms are known to be weak, and even strong hashing algorithms used improperly do not convey the intended security benefits. However, when all you need to do is verify that someone is sending the correct password or other secret, using a hash rather than a reversible encryption is the way to go.
Please; Do Not Try to Create Your Own Cryptographic Algorithm
Creating effective encryption algorithms is hard. Its so hard, in fact, that many people don't at all realize how hard it is. It is a common amateur computer programmer's mistake to create ones own "unbreakable encryption scheme". I've been there myself, actually. This is absolutely one of those areas where the idea of 'knowing enough to be dangerous' is true. Fortunately, this is a field of endeavor complex enough where such homegrown code is often not used far beyond those who created them. Still, a Google search for "my own encryption" (link) can turn up some interesting results.
In fact, even simply using an encryption algorithm correctly can be quite difficult, which is the reason for this article.
How to Use Encryption in Your Programs
So, I've covered some don'ts above, broadly speaking. Here are some of the things you want to do as you plan to use well-established encryption algorithms in your own programs.
Perhaps Don't Use Encryption at All
First, consider if you truly need to implement encryption. What I mean is, does the environment or API in which you are working already have built-in encryption you can use? Could you re-structure your application to utilize other services to accomplish the goal?
For example, all the most popular web servers support SSL/TLS encryption. A web application which desires to encrypt data should use this, of course. However, even if your n-tier application needs a richer client than a web browser, could you use Web Services over a secure connection? This would allow you to leverage the built-in support for encryption in the server and the client Web Services library. Your client and server application code need do almost nothing to get the benefits. Of course, that requires that you can use Web Services.
For local file storage, does the operating system have any built-in support for encrypting files? Before you protest about the reportedly weak security such systems have, consider the issue of where the key would be stored to decrypt the data. Think, also, about the value of the data versus the time it will take to implement something other than what exists already. If you need to implement your own system, using well-known secure algorithms, just be sure your efforts actually result in something better than what is available to you, and that the value of the data justifies the time you will spend.
Always Use Shared-Secret, Symmetric Encryption for Data
The actual encryption of data being transmitted should be done using standard, shared-secret symmetric encryption. Asymmetric (public key) encryption is only used to send the shared-secret key to be used for the rest of the data transfer. Remember; Asymmetric encryption was invented to solve the problem of sending the secret key which would be used for encryption/decryption over the same public network where the data is being sent, so that the two parties don't need to have some external way to send that key.
To do this, one side of the conversation will generate a public/private key pair, and will send the public key to the other side. Remember; that public key can literaly be 'public'. It cannot be used to derive the private key. When the other side gets the public key, they generate a new symmetric or 'shared-secret' key, and encrypt that key with the original sender's public key, then send that cypher text to the original sender. The original sender then uses their private key to decrypt the shared-secret key, and that's it; they now both have the same shared-secret key, which they can use to symmetrically encrypt all the data they are sending back and forth this session. The public/private asymmetric key pair is no longer needed. Once the communication session is done (say, you finish your online shopping at Amazon.com), the generated shared-secret keys are discarded; new ones are generated and shared as described here in the future.
Many people ask about the APIs for asymmetric encryption methods built in to various languages and frameworks; the APIs often have no apparent functionality for encrypting streams (such as a TCP/IP connection), andthe data encryption code seems to generate errors when the plain text being input exceeds what seems like a fairly small amount of data. That is because generally, these algorithms are not meant for encrypting data, per se, but are meant to share secret keys which are used to handle the actual encryption/decryption.
Exception That Proves the Rule?
Sometimes, other small data will be directly encrypted asymmetrically. For example, if a password must be sent over a network, and the password must be decrypted on the other side, that may be encrypted asymmetrically. As long as the data is not large, and is not being sent at a high rate of speed; as this method is slower.
However, even in cases where it seems like people are saying that public/private keys are actually being used to protect data, that is often a misunderstanding which comes out of discussion of certain encryption-related schemes where the private/public key combination must be stored. Because many of the things which will be encrypted actually go beyond real-time data transfers, such as encrypted e-mails, file, or databases, that means that keys must be stored somewhere. Otherwise, the data could never be read again!
When encrypting data which must be stored, a symmetric or secret key is still usually used to encrypt the data. Then, that secret key is encrypted using the public key, and put at the beginning of the encrypted data (along with notations, if required, about how long it is, and what method was used to encrypt things). When the intended recipient is ready to decrypt the message, the private key is used just on the beginning of the data which has been noted as the encrypted symmetric key, which is then used to decrypt the message itself.
Working With Encrypted Data
Generally speaking, encryption algorithms never work with 'text', but with byte data. This, despite the fact that we use the terms "plain text" and "cypher text". Likewise, encryption keys are never just 'text'. There are two constructs which are generally used, then;
- Byte Arrays
- Obviously, an array of bytes is really what is being worked with. While many APIs may shield you from this detail, you are best to understand it, and you should know how to get your data into a byte array for encryption.
- Base-64 Strings
- When cypher text or key data must be represented in a textual form, Base-64 is often used. This makes it easy to store or transmit the data as XML, for instance.
Key Storage; Secure, or Not at All
Seriously consider if keys must be persisted at all. Only if you are encrypting data which is stored for any length of time must a key be stored. You never need to store a key to encrypt a network data transfer session; generate a new key for each session, use public-key encryption to transmit the key, and then discard the key(s) when each session ends.
For cases where you do need to store a key long-term, be certain you are storing it properly. If the operating system has a mechanism, consider using it. Certainly be sure that the keys are stored in space available only to the required user(s).
Conclusion
This was meant just to provide some basics. My next article will cover more specifically how to use some of the encryption APIs which exist in the .NET Framework. I will cover using symmetric session keys to encrypt streams of data such as files or network transfers, using hashing algorithms to keep pre-shared secrets from needing to be sent in the clear, and properly using Asymmetric encryption, such as the RSACryptoServiceProvider class.