When is Secure Code Not Secure Code?

by Andrew Barber 24. January 2009 15:20

Continuing my series on the SANS/CWE list of 2009's Top 25 Most Dangerous Programming Errors (link), I will cover three of the errors that involve apparently well-intentioned behaviors on the part of programmers, which nevertheless can end up in disaster. Specifically, Use of a Broken or Risky Cryptographic Algorithm (link), Use of Insufficiently Random Values (link) and Client-Side Enforcement of Server-Side Security (link).

It is unfortunately fairly common that persons in many professions, certainly including I.T. related ones, will operate from assumptions which are only half-correct. Most I.T. workers will agree with the statement that "computer security is an important topic". But from there, ideas, training and opinions fly in many directions, and unfortunately many of those directions are dangerous. You may have heard the statement, "Security by Obscurity is No Security at All". Literally speaking, this statement is wholly untrue; The whole basis of a good password, for example, is extreme obscurity. The idea is that a password is something so obscure, that no one could possibly ever guess it. So obscure, in fact, that if you didn't know it yourself, you would not be able to guess it.

However, passwords are really the only place the statement fails in relation to security. Although security is often seen as an issue for Information Security (I.S.) teams and administrators, it starts with application developers. (hence this series in the first place). The problem is, many novice - and frightfully too many experienced - programmers will take serious shortcuts when building in security to their applications. Often, developers will believe they have found some truly obscure method to protect data that won't be able to be cracked, without realizing that the best encryption schemes, for example, actually are those which are fully known, and thereby are constantly and thoroughly tested. Other times, a developer will implement a security mechanism imperfectly, being unaware of what is important in the algorithm working to protect the data.

Broken or Risky Cryptographic Algorithms

I recently posted an article which included in the discussion information about the MD5 hashing algorithm, and the fact that SSL certificates created using it may expose the certificate authority to being impersonated. A hashing algorithm is a way to take any data (a file's content, a password, the contents of an e-mail) and produce a very large number which 1) will be effectively unique to that particular set of data and 2) can not be used to determine what the original data was. This has numerous uses in verifying identities, verifying file/message contents were not changed, and passing 'secret' information, such as a password, across a public networking without someone being able to figure out the 'secret' being shared. MD5 has been known for some time to be 'broken', enabling someone who can observe some of the information in an exchange of data to be able figure out some of the information being exchanged. Yet, some certificate authorities continued to use it.

In addition to known broken algorithms, one fault that many programmers fall to is cooking up their own encryption scheme, thinking it cannot possibly be broken because no one knows how it works but them. This sort of thinking was one strong reason for the "Security by Obscurity..." statement above to have become so oft repeated. Encryption is a highly specialized field that involves mathematical, logical and critical skills and knowledge far beyond the ken of 99.999...% of even the best software developers. "Armchair Mathemeticians" need not apply, but far too often, do. Normal, mortal (and otherwise brilliant) computer programmers often have difficulty merely implementing code which accomplishes an encryption algorithm, to say nothing of actually creating the algorithm in the first place.

Thankfully, though, those of us without walls full of mathematical degrees do not ever need to create such code. Procedural and Object-Oriented programming languages are full of functionality built-in which implements a wide range of cryptographic algorithms, and a careful searcher, willing to assure proper verification is done, can find bleeding edge - yet highly tested - new algorithms for those super-secret projects. The way a mere mortal programmer should show off their cryptography skills is in how correctly they implement and use the cryptographic algorithms that were made by true experts, and have been tested by other true experts and shown to be secure - for now.

Which leads to the last part of this; In time, pretty much every algorithm is likely to be found to have a flaw which can be exploited by current technology. At one time, MD5 was theoretically unbreakable without "a billion years" of processing time. Now, it has been shown to be almost trivial to break for those in the know. Sometimes an algorithm can be improved simply by using larger keys, and sometimes a flaw precludes the use of the algorithm altogether. But programmers must design their applications to be modular in this (as in many) respects; If the code used today is 'secure', that's great. But if that method of encryption is ever broken, it must be easy to 'drop in' a different encryption algorithm. Otherwise, it may not get done.

Use of Insufficiently Random Values

Let me start with what might be a statement someone will argue against; Computers are completely incapable of generating random numbers. I should say, binary computers in use today are completely incapable. Our computers are specifically designed, in fact, never to be random. In fact, in an extremely literal sense, one could say that nothing is ever random at all. Instead, one might note that some events simply present a human observer with no way to predict them, thereby creating an illusion of randomness. Many definitions of the word random include this mention of predictability.

"But Andrew," I hear you say, "my computer does all sorts of things at random! I have this game and there are random encounters in it. Another game gives a random chance to succeed at something. Windows Solitaire has to have randomness built in, or how could it shuffle the cards??" The answer to this lies in what I said above; some events simply present a human observer with no way to predict them. In order to 'fake' randomness (called pseudo-random number generation), computer software code will take input from sources the user can not readily observe. A highly oversimplified example (and, in fact, a source of programmers making the above mistake) could be some form of number that represents the current computer system's time and date, down to the millisecond. Certainly, no human being can possibly guess what exact millisecond their computer thinks it is at any given moment, right? So, take that number and feed it through some algorithm (perhaps a simple bitmask or modulus type of operation) that reduces the huge number to one in the range requested, and viola'! Instant randomness!

In fact, the cases where random numbers are needed not for secure operations will often do something roughly similar. When a game starts up, for example, it may start with a 'seed' number (that is; a number which starts the randomization process) based on the system time and other high precision data easily available. This 'seed' number is passed in to the randomization algorithm, which generates the next number in the series - again still a very large number. That result is reduced, again perhaps with a modulus or similar method, to a number within the range needed (say the program needs to simulate a random dice roll, so 1 through 6), and the result it output. However, the whole, original result number is saved to use as the input value the next time a random number is required.

This sort of pseudo-random number generator works perfectly well for games, screen savers, or any sort of thing that requires fast generation of numbers that won't be a big security risk if someone can predict them... because they can be predicted. How? It was in the explanation: The initial seed value. If you find a way to make the program start with the same initial seed for its random number generator, you would always get the same series of numbers from it. Also, if you know what random number algorithm is used, and you know what the seed value is, you can generate those numbers yourself in order to predict the execution of the program's so-called 'random' numbers. It may be impossible for a normal human to known when they double-click on a game's icon what the exact time down to the millisecond is. However, it is not at all impossible for an encryption analyst to find data such as that. In fact, the operating system may easily give that information up to the proper type of query.

What's That Got to do With Encryption?

Encryption relies on random numbers in many ways. For example, the proper use of hashing algorithms for transmission of private data usually includes a 'salt' value; a random value added to the value being hashed, meaning that the resulting hash will be different each time, say, you send your password over the network. If not for the salt, a computer cracker 'listening' to your transmission of your hashed password could simply re-send that same value to the server when trying to log on to it. They would not know your password ever, but they don't need it, when the server is merely asking for a single hash. But if the server also says, "take your password, add the salt value afkuh2345uoiygaw to it, then send me the hash of the combined value", the cracker listening in to your response will not be any closer to knowing what to respond with when the server challenges them when they try to log in as you with, "take your password, add the salt value ZQAQkjQG$GAGF97 to it, then send me the hash of the combined value". However, if the cracker can find some way to predict what random numbers the server will challenge you with each time, they may have a leg up.

This is where something a stage above pseudo-random number generation comes in. Cryptographically secure random number generation goes beyond, to generate numbers which could not reasonably be predicted by anyone observing a computer over the network. I won't even pretend to know where such methods get their 'seed' values from, how often they might change the seed (if they do), and such. But I know that those cryptographic experts noted above say that you must use sufficiently random generation in order to use a value in any security-related context. The flip side to this is that the algorithms which both 'seed' and generate this type of 'nearly-true' random number are much, much slower than the ones which generate the pseudo-random numbers. They can also be somewhat more difficult to use, because they often require various extra setup steps to use, either in the code of the program using them, or even on the operating system of the computer where the software will be installed. When installing an SSL package on a server, for example, an administrator may find out that the SSL package can not start generating the random numbers it requires because the system does not have enough entropy to use to start. Typically, a computer uses as much external information as possible to generate this; perhaps in part using an algorithm on the time between keypresses on the keyboard, coupled with what keys are being pressed, how the mouse is being moved, the temperature of the CPU and motherboard. The point is to gather a very large volume of data which the computer neither can expect/predict, or typically may record - data which, in fact, could not possibly, humanly be recreated, purposefully.

When creating a program which uses an excellent, pre-made encryption algorithm, with a reasonably large key or salt size, the programmer also must be sure that the random values generated in the encryption process are truly random in this way. Otherwise, patterns can be (and are) predicted by crackers and cryptanalysts.

Client-Side Enforcement of Server-Side Security

Finally, a problem I recently came across, and see terribly frequently, in fact. I almost included this in the previous entry, The Client is in the Hands of the Enemy (link), in fact, but that was already going to be very long, and it applies as much here. One issue that programmers try to be careful with in any sort of client-server software (which includes a web site - the web browser being the 'client') is how much data is being sent back and forth between the client and server. To the extent practical (and safe!), the flow of information should be kept to a minimum, so that any bandwidth is not saturated with the information. A common - and very good - practice in pursuit of this goal is client-side validation. Put simply; the client code will validate information entered so that if an error is there (say, a user does not enter anything in the 'phone number' field of a request for a tech support callback), the client can alert the user to it before it even sends the information to the server, which would require the server itself then to complain about the error, resulting in a 'round trip' of data flowing back and forth, wasting bandwidth. By having the client perform that check, the excess bandwidth usage can be eliminated.

The problem is, many developers - whether creating a web site or a desktop client-server program - mistakenly believe that the client can become the sole point of validation, even for security-related checks. One example I worked on recently involved a web site which gave full administrative access to entering content on the web site based solely on a value of "1" being present in a cookie of a certain name. This cookie was supposed to be set by the server once the user had entered the correct user name and password on the site, but that simple cookie value could have been manually placed into their browser by anyone with a text editor. Similarly, another site indicated you were logged in by another cookie value set simply to your account number. Again; all one had to do was find the place to manually edit their browser's cookies, and set the appropriate cookie to someone else's account number, and they would be logged in 'as' that other person. A final example is a custom application which allowed its users to select a certain number of 'free gifts' from among those available. However, only the client enforced the limit of gifts. Anyone observing the data transferred between client and server could have injected 'extra' data to claim every single gift available (which was 20 times the limit they were allowed), and the server dutifully passed on the orders for each of those gifts to the third-party suppliers.

Client-side validation is a very useful tool to make an application more responsive to the user, and reduce bandwidth usage on the server. However, all of the same validation must also be performed on the server. Input from the client must never be trusted to be safe.

Conclusion

It is important for a developer always to remember that it's not good enough just to know that "security is important", but that the ever-changing landscape requires careful attention to detail and repeated review of security-related algorithms used in applications. As usual, programmers can find lots of built-in code to help make their applications truly secure in most modern languages, development environments, application frameworks and standard code libraries. But they must be used - and they must be used properly.

Comments are closed

Links/Profile

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent those of my partners, clients or contractors in any way.

© Copyright 2012 AndrewBarber.com