A salt is an additional sequence of data (bits or characters, however you want to think about it) that is appended to a password before hashing the password and storing the hash in a database.
There are three types of salts that can be added to a password. Assume the password is passwd! and the salt is 12345, here are the three types described in a table.
Salt Type |
Salted Password |
SHA256 hash |
unsalted |
passwd! |
12f225551a043b6e136b2cf03546b06efb289d29ab42cebfd78ee101d8555304 |
Prefix |
12345passwd! |
b38c49760d484119f227fab640cb1415d58f765c0727dc9ad7e0a5a66003d041 |
Infix |
pass12345wd! |
a9fa7c693b691c8ceded848877afdb1259f28f194863ebb33c07c4bbfa1ff04c |
Postfix |
passwd!12345 |
1315abea2576c3b6270712df587359d614ae1a1698d69dd2175459e411f678f5 |
Of course you can use multiple of these in any combination as well. You can surround the password by using both a prefix and a postfix. You can use all three. You can have multiple infixes, so a password of 12 characters such as MyPassword1! could become MyP12345assw12345ord1!.
Reasons for salting
Salting was created because hashing is imperfect. Hashing has multiple problems. However, I am only going to describe two problems and how salting improves the security in each situation.
Problem 1 – Rainbow tables
A database of hashes for all character sequences up to a certain number of characters can be generated over time. This database is called a rainbow table.
Scenario
Let say you registered at site ABCStore.tld. Some time later, ABCStore.tld reports that their database was compromised. Your password was not stored in the database, but the hash of your password was. Lets say your password was passwd! and you use it on every site on the internet.
With a rainbow table, your password will be discovered in moments and that password will work on other sites perfectly.
How a Rainbow table works
It works by generating a table of passwords and their matching hashes. As each character is added to a password, the total possible password options increases, making the next number of passwords hard to generate. For an English keyboard, there are 26 upper case letters, A-Z, 26 lower case letters, a-z. There are ten numbers, 0-9. There are 33 special characters, 34 if you include tab. There are other characters available, but the majority of people who speak English, those are the only characters used in a password. That is 95 possible options for a password (96 if you include a tab).
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 0
~ ` ! @ # $ % ^ & * ( ) - _ = + [ { ] } \ | ; : ' " , < . > / ?
So for every character in length the password is, the possible values grow exponentially.
Password Length |
Possible combinations |
6 |
96^6 |
782,757,789,696 |
7 |
96^7 |
75,144,747,810,816 |
8 |
96^8 |
7,213,895,789,838,336 |
9 |
96^9 |
692,533,995,824,480,256 |
10 |
96^10 |
66,483,263,599,150,104,576 |
11 |
96^11 |
6,382,393,305,518,410,039,296 |
12 |
96^12 |
612,709,757,329,767,363,772,416 |
However, it is actually easier than this. Most passwords are dictionary words, so a Rainbow table can be created with dictionary words first. Now a large portion of passwords are covered.
Then because most passwords are lowercase characters only, a hacker can have their rainbow table created by first creating only the options using all lowercase letters. Notices the numbers are significantly smaller.
Password Length |
Possible combinations |
6 |
26^6 |
308,915,776 |
7 |
26^7 |
8,031,810,176 |
8 |
26^8 |
208,827,064,576 |
9 |
26^9 |
5,429,503,678,976 |
10 |
26^10 |
141,167,095,653,376 |
11 |
26^11 |
36,703,444,86,987,776 |
12 |
26^12 |
95,428,956,661,682,176 |
Also it is common to only put numbers on the end of a password, so they can then do all lowercase passwords with one number at the end, then two numbers at the end. It is also to common to replace letters with number, E = 3, l or i = 1, S = 5, etc. It is also common to only capitalize the first letter, so that can be done too. So as you can see, by doing far less than generating all passwords, a rainbow table can be created that matches almost all passwords used.
Password Creation Hint: When creating your own password, avoid these trends and use long passwords. If it is allowed, use a sentence with punctuation as a password if you can such as this:
I love my wife, Michelle!
Sentence passwords are easy to remember, easy to type, and are freaking hard to populate into a rainbow table. The above sentence is 25 characters. Each space, the comma, and the ! are special characters, that is 6 special characters. There are two capital letters, one not just at the front.
So once such a rainbow table is created, if your password is easy, your password is just a select statement away from a hacker who compromises a database.
Select * from RainbowTable where hash=?
The hacker now knows your actual password and can use that on other websites, like your bank account. You scared now? You should be!
How a salt improves security against a Rainbow table
Remember our scenario. How is that scenario less of a security concern because salting was used? The answer is fairly simple. Because salting changes the string being hashed, the hash is different. So a look-up in a Rainbow table will be less likely to return a result. The probability is high that even if a simple password was used, the rainbow table will not return a result. If a result is returned, the likelihood that the returned string is a collision not your password was increased.
Note: If the salt is known, or can be figured out, obtaining your password is still possible, but it is harder and will take longer. If you change your passwords often enough, you only need the building of the Rainbow table to take longer than the time you use your password.
Problem 2 – Collisions
Another problem with hashing is that there is a possibility, however remote, that two completely different strings return the same hash.
Scenario
You register using the same username and password at both ABCStore.tld and XYZStore.tld. Two sites don’t use a salt, but they both hash your password with the same hashing algorithm. ABCStore.tld reports that their database was compromised. Your password was not returned, but a collisions was returned.
Even though it is not your password, the collision string can be used to authenticate at XYZStore.tld also.
How a collision occurs
Because a hash is limited in the amount of characters it generates, but there are an infinite amount of character sequences, it stands to reason that infinite is greater than anything finite, once all hashes are used, the next hash created from a unique sequence will be a duplicate.
(Read this: What’s the shortest pair of strings that causes an MD5 collision?)
Lets make the numbers ridiculously small so you can understand easier. Imagine a hash that changes everything to a single digit number, 0-9. Once all ten hashes are used, the eleventh hash created is guaranteed to be a duplicate.
With SHA-256, the rules still applies, only there are a plenty of unique hashes, so collisions are extremely hard to calculate, even for multiple computers over a long period of time (at least today in June, 2012).
How salting improves security against collisions
If we look at our scenario where a hacker was able to determine a collision for the password used, not the password itself, they may not even have the salt string.
Well, if salting were used by both ABCStore.tld and XYZStore.tld, and both used different salts, then the collision that worked for ABCStore.tld doesn’t work from XYZStore.tld because they used different salts, and therefore the passwords aren’t the same. So that means that the hackers have to calculate your actual password, which is harder to calculate.
Even better, the hacker wouldn’t even be able to log into to ABCStore.tld because the login system will append the salt to the entered password, so having a collision doesn’t help, because after entering the password, the collision gets the different salt appended to it and it is no longer the same and is not a collision on another system.
Less-effective salt implementation
Salting works. However, it is often implemented in a way that is less-effective. Remember, “less-effective” is still somewhat effective. Let’s look at some less-effective uses of salting.
Using a single salt for all passwords
Look at the following table of passwords. Can you easily tell which is the salt and which is the password?
123456789passwd! |
123456789myPass12 |
123456789littleJohn |
123456789jennifer |
123456789secret100 |
The salt is 12345679. Now because the salt is known, a hacker can now create a rainbow table using the salt.
This is “less effective” because it is still somewhat effective. This method is just not as effective as it could be.
According to scenario 1, it is effective because the standard rainbow table will not return a result. The hacker is now forced to create a new rainbow table. However, it is less effective because the hacker only has to create one more Rainbow table, and they can probably get all passwords with a partial rainbow table.
According to scenario 2, finding a collision will not allow authentication to another site with a different salt or a site without a salt. However, because the hacker knows the salt, they can find a collision from which they can remove the salt, and that leaves them with a collision string that works on ABCStore.com.
Using a static hash resource in the code
If a static hash is in the code, it is possible that this value can obtained from a binary or by compromising the code itself. Storing the hash in the code is effective as if the database is compromised, the hash was not in it.
It is somewhat effective in that the hacker now has to figure out the hash on their own. It is less-effective in that the code could be what was compromised.
Publishing the salt in the database
Since a single salt is “less effective” the author of an authentication system may decide to use a separate salt for each user. However, now he has so many salts, he needs to track them somewhere. The salt becomes a column, usually next to the password column, and usually the column is clearly titled: “Salt”.
According to scenario 1, this is still “effective” in that now a hacker must generate a Rainbow table for each user. Why is it still less effective? Because the salt is known, and only one password per Rainbow table is needed, a complete rainbow table is not needed. As soon as a string is found matching the hash, that Rainbow table can be abandoned. A dictionary designed to try high probability passwords first can significantly limit the time needed to find the average person’s passwords, which are usually simple and common dictionary words or names. Most passwords can be determined in a short amount of time. The time is of course based on resources. Of course, with cloud services, resources are cheap and powerful.
Using an existing user value as a salt
Instead of using a column called “Salt’, an existing known column, such as the username, first name, or last name is used. Of course, you wouldn’t use the actual value in the known column, it is probably too short. You would use a hash of the first name or a cryptographically random character sequence added to the value in the known column.
This is effective in that the hacker has to figure out which column is used as the salt. It is less effective because the hacker only has to do this on the first user and once the column is determined, for the rest of the users it is the same as publishing the salt in the database.
More effective implementations
One of the main failures of salting is the common idea that the salt is not secret. A secret or calculated salt increases security.
Salts should be unique per user and they should be hard and as close to impossible to figure out. Here are some ways that a hacker may never figure out the salt, and never be able to create a Rainbow table using it. Something that provides the security as close to that of a one-time pad as possible.
Use a variably located calculated salt
There is not one salt, there are one or more salts. The salts don’t exist in the code, nor do the exist in the database. However, it is a combination of both. And the salt is not always a prefix, infix, or postfix, or any combination of the three, however, the number of salts and the location(s) are determined by one or more calculations.
There are seven possible salt combinations, however, because the infix could have multiple locations as well, the salt locations could be many more.
- prefix
- infix
- postfix
- surround (prefix and postfix)
- prefix and infix
- infix and postfix
- all (prefix, infix, and postfix)
When there is an infix, it could be multiple infixes every n* characters where n could even be calculated so a 16 character password could have 1-15 infixes.
The number of salts and their locations and where they go could be determined with a calculation as simple as this:
SaltLocations = Firstname % 7
Of course a more complex method could be used as well. Again, the number of infixes could be calculated. Each salt location should then have a separate salt and each salt should be created using a separate calculation. Also, make sure both the code and the database are used for the calculations. Even a simple algorithm makes life harder on the hacker. Go ahead and create a column called salt in the database, however, only store a random character sequence there.
Salt (database part) = First letter of first name + random character sequence + first letter of last name
This can be made even more complex by altering this using a function.
Salt (code part) = function(First two letters of first name + random character sequence + first two letters of last name)
Even if the function in code is a simple shift one bit function, the hacker will not know the function and will have difficulty figuring it out. Even if they figure out one salt, will they be able to figure out the number of salts, what each salt is, and where each salt is used.
This is more effective because the hacker will have to compromise both the database and code to figure out each salt and often the database and the code are stored on separate servers.
Use a variably located calculated salt including information outside the database and the code
No matter how many algorithms you use or how complex they are, if the hacker has the code and the database, they usually have the salt. What if you could change that? So what if having the code and the database is not enough.
External File
What if the code was obtained and the database was obtained but on inspection of the code, a file on the system, located nowhere near the code, held a value that was part of the hash. Now that file has to be accessed too. This file could be anywhere. On the same server, or on a separate server. It could be accessible by different credentials.
Salt = function(First two letters of first name + random character sequence + first two letters of last name + GetInfoFromFile())
Web Service
What if the calculations were performed on a separate server entirely using a web service and lets assume the web service is secure.
hash = Remote_function(password, First two letters of first name + random character sequence + first two letters of last name + GetInfoFromWebService())
The web service could also determines where to place the hash, in any of the seven ways mentioned, so compromising the local code no longer benefits the hacker.
However, adding a third server is costly and more complex and adds additional need for security. For example, if you don’t add security and the hacker gets the database and code, they could simply call the same web service. Adding security such as ‘source IP address of the calling server’ would be effective.
Now the database, the code, and the third-party server all have to be compromised by the hacker for them to determine your salt and then your password or a collision of your password.
Hashing your hash a calculated amount of times
Ok, so if you do everything above, you can still make life even harder for a hacker by hashing your hash a number of times.
long RoundCount = CalculateRoundCount();
hash = hash(saltprefix + password)
for (long i = 0; i < RoundCount; i++)
hash = hash(hash)
Since we calculate the RoundCount so that every user in the database was hashed a different amount of times, it is difficult for the attacker to know how many hashes were run and so they have to try them all.
Kerckhoffs’s principle
Kerckhoffs’s principle states that a cryptosystem should still be secure even if everything about the system, except the key, is public knowledge.
This principle is abused by some security engineers in multiple ways:
- they assume that only one key can or should exist.
- they apply Kerckhoffs’s principle, which is for a single cryptographic action, to an entire multi-part software system.
- they claim that a security measure is useless if that security measure relies on another part of the system being secret.
So is a calculated hash security by obscurity? Yes an no. This is security by multiple secrets. Security by multiple secrets is not necessarily the same as security by obscurity. Security by obscurity provides the idea that the code could be riddled with vulnerabilities but because the code isn’t known, it doesn’t matter, it is secure by being unknown. If you don’t know the hash, can you create valid rainbow table? No, you have to create the entire table and move to brute force. So it is security by making the hash a secret. Does it improve the hashing algorithm as a single cryptographic action? No. Does it require the attacker to build a bigger database of hashes? Yes. Does it make it more likely the hash they match with is a collision and not the user’s actually password? Yes.
Look, if the whole system were known, that would include the key or secret. Kerckhoffs’s principle explicit states that everything but the key or secret is known. His statement is true. It is best if that is the case. But that doesn’t mean security can’t be better for any given situation where that is not the case.
The best way to check if you are making your system more secure, is very similar to how you define features for a product: in stories. Barneck’s Security Story Principle (Barneck is me 😉 states this: Any security increase is valid if the security increase is for a hacker story that is valid.
So let’s look at some stories. Is security increased by making the salt secret?
Story |
Security Measure Taken |
Security is increased for story |
A hacker gets access to only the database data. |
The salt is secret in the code. |
Yes |
A hacker gets access to the database and the code. |
The salt is secret in the code. |
No |
A hacker gets access to the only the database data. |
The salt is secret and not in the database or code. |
Yes |
A hacker gets access to the database and the code. |
The salt is secret and not in the database or code. |
Yes |
If the salt is a secret, and the algorithm that calculates the salt uses a secret so the salt can’t be recalculated, then the system is more secure than it would be if the Salt were known. Even if the system is open source, if the open source software was written so the algorithm that creates the hash uses a secret that is generated by the user and is different for every install and not static in the source, then the attacker must first discover the secret to the salt before they can begin to use a rainbow table to find a collision.
There is a belief that cryptography should involve as few secrets as possible. However, one single secret is also a problem. What if that one secret is discovered. Shouldn’t the redundancy principle apply to secrets? Isn’t there a backup secret so discovering one secret is never enough? We inherently know that multiple keys and doors work better than one key and door. This is why we create games that have mazes where we have to find one key before we can find another.
Think of it as an arms race. Making the salt secret might not make your hashing algorithm any stronger. But does it have to? Isn’t it enough to make it harder for the hacker to make a rainbow table? Since a hacker is using a rainbow table, isn’t it valid to respond by attacking their ability to make a rainbow table? Yes it is. Time is a very important dimension in security and any measure taken that significantly increases the time required to hack a system is valid.
Now, as an exercise for you, what if the following calculations were done using separate secrets stored in separate places?
- Hash
- Hash location(s) – prefix, postfix, infix, or multiple
- Number of hash rounds
How does having separate secrets improve your system’s security? What administration concerns does it create?
Conclusion
Salting is effective. Even when a salt is implemented in a less effective manner, it is still better than having no salt at all.
Typically the salt is considered “not secret” but I argue that a hacker who compromises a database, the code, or both, should still have difficulty determining the salt or how the the salt is added to a password as that information allows for an easier creation of a rainbow table.
Implementing salting in these more effective ways can increase complexity, making it unlikely a hacker could reverse engineer a password using the hash or find and use a collision.