Preamble
Hashes are encoded representations of a variety of inputs (be they integers, strings, objects from programming languages, etc.). They are frequently employed and associated with hash tables, thus serving as a means for quickly storing and accessing data in near constant time – O(1) – as opposed to other conventional data structures (such as binary trees, heaps, and linked lists).
Hashes employ complex mathematics (known as hashing functions) in order to convert their inputs to some numerical output. These outputs (depending on the complexity of the hashing function) can vary in size. Effective hashing functions are:
- Consistent: all else being equal, the same input should produce the same hash every time.
- Efficient to compute: to be functionally useful and fast.
- Produce uniformly distributed output: this prevents collisions – or similar hashes from dissimilar inputs.
Hash functions used for cyber security – or cryptographic hash functions – are designed to keep data secured as one-way functions; the outputs from these hash functions have undergone such complex mathematical transformations that the only way you would be able to derive how it was produced would be by knowing its original input.
How is hashing used?
There are a variety of ways that hashing has been employed in modern technologies. Some services use hashing to ensure file integrity; if a file’s hash has changed in being transferred, the user knows it was somehow altered/corrupted. Hashing is also a recommended way for databases to store user credentials; if ever their database were to be breached by hackers, the hackers would only be in possession of unintelligible numbers instead of cleartext usernames/passwords. Programmers also make use of hashes in conjunction with hash tables to make very fast key-value pair data structures; these create a means for programs to access data fast and effectively.
Hashing vs Encryption
Hashing encodes input instead of encrypting it. The output from hashing functions are representations of the data that was put into it – limited in scope to a number of a given size (note: modern encryption algorithms often present these numbers in hexadecimal, which makes use of not only the digits 0-9, but also the letters A-F to represent 10, 11, 12, 13, 14, and 15, respectively). Encryption, by comparison, translates input into a new output using a cipher. Encrypted messages are not limited in size.
It should be noted however, that hashing is often used in the process of encryption through the use of seeds (often referred to as “numbers used only once” or nonce) and salt.
Common Hashing Algorithms
A variety of hashing algorithms have been developed and implemented over time. One such algorithm – Message Digest Algorithm 5 (MD5) – was once one of the most secure and prolific hashing functions used in conventional technologies. In 2004 however, Chinese researchers discovered that the algorithm was vulnerable to collisions; subsequent discoveries throughout the decade further revealed the extent of vulnerabilities endemic to the MD5 algorithm.
The Secure Hash Algorithm (SHA) family is now the industry standard (although early iterations of SHA, such as SHA-0 and SHA-1 have likewise been found to be vulnerable to compromise). SHA-2 was the technical successor of the series, being more complicated than SHA-1; however, since it shares the same mathematical operations and structure as SHA-1, it’s been predicted that it will only be a matter of time before it also becomes compromised. As such, in 2015 researchers released SHA-3, which may yet be an effective replacement for SHA-2.
Example of a Basic Hash
We can demonstrate how a rudimentary hash is implemented with some hackneyed Java. In the code below, we are cycling arbitrary string inputs through a for loop. Within the for loop, we are:
- Converting each character to its ASCII integer representation (for example: h = 104).
- Adding the integer to the existing sum total of previous hashed characters in the string.
- Multiplying the result by a prime number (31) for security Finally, we convert the output to hex (just as a conventional standard).
Finally, we convert the output to hex (just as a conventional standard).
public class hashExample{
public static void main(String[] args) {
String exampleString = "hash this";
String secondExample = "hash this";
String diffExample = "different hash";
int hash1 = 0;
int hash2 = 0;
int hash3 = 0;
for (int i = 0; i < exampleString.length(); i++){
System.out.println(exampleString.charAt(i) + ": " + (int) exampleString.charAt(i));
hash1 += 31 * (hash1 + (int) exampleString.charAt(i));
}
//h: 104
//a: 97
//s: 115
//h: 104
// : 32
//t: 116
//h: 104
//i: 105
//s: 115
for (int i = 0; i < secondExample.length(); i++){
System.out.println(secondExample.charAt(i) + ": " + (int) secondExample.charAt(i));
hash2 += 31 * (hash2 + (int) secondExample.charAt(i));
}
//h: 104
//a: 97
//s: 115
//h: 104
// : 32
//t: 116
//h: 104
//i: 105
//s: 115
for (int i = 0; i < diffExample.length(); i++){
System.out.println(diffExample.charAt(i) + ": " + (int) diffExample.charAt(i));
hash3 += 31 * (hash3 + (int) diffExample.charAt(i));
}
//d: 100
//i: 105
//f: 102
//f: 102
//e: 101
//r: 114
//e: 101
//n: 110
//t: 116
// : 32
//h: 104
//a: 97
//s: 115
//h: 104
System.out.println("hash1: " + Integer.toHexString(hash1)); //hash1: b53a04cd
System.out.println("hash2: " + Integer.toHexString(hash2)); //hash2: b53a04cd
System.out.println("hash3: " + Integer.toHexString(hash3)); //hash3: dc7cc638
}
}