How (Not) to Design a Hash Function
Recently, Casey Muratori has implemented a proof-of-concept terminal, which is designed around fast input processing and rendering. This is an important and commendable effort, since the vast majority of software performs tens to thousands of times slower than it can.[citation needed]
One of the design choices of refterm is to use a hash table to cache glyph rendering. The key in this table is the UTF-8 string of a single glyph. To avoid string allocations, the hash value of the glyph bytes is used as a key instead.
When Casey got asked about the possibility of hash collisions on stream, he responded with a claim that the hash function used in refterm is “a strong hash”, and the complexity to find collision is about 2^64 operations. After analyzing the code for the hash function used in refterm, a few flaws were found in the hash function, and O(1) attacks were discovered.
All the source code presented here was the most recent refterm code at the moment
of writing this post, commit 20a063f3700ac4c2913076864cb560301e76b2dc.
The hash function ComputeGlyphHash
is located in refterm_example_source_buffer.c:166.
Operation of the hash
- The hash value is initialized to input length
- The hash value is
xor
ed with the provided seed value, which is always equal toDefaultSeed
- For each full 16 bytes chunk, the following bit mixing method is used:
- the hash value is
xor
ed with the input chunk - the hash value is fed four times through
_mm_aesdec_si128
with zero key
- the hash value is
- For the last chunk, it is zero padded to 16 bytes
- And fed through the same bit mixing method as described above
In FP terms, this could be described as:
ComputeHash(Input) = ZeroPad(Input, 16).chunks_exact(16)
.fold(Length ^ DefaultSeed, |Hash, Chunk| AesDec4Times(Hash ^ Chunk, Zero))
Flaw 1: Padding weakness
The input padding is done by zero-padding. This means that if the last block ends with zeros, the result of processing this block is going to be the same as if that block was right-trimmed of nul bytes.
char Temp[16];
__movsb((unsigned char *)Temp, At, Overhang);
__m128i In = _mm_loadu_si128((__m128i *)Temp);
In = _mm_and_si128(In, _mm_loadu_si128((__m128i *)(OverhangMask + 16 - Overhang)));
Since the hash value of input is initialized with the input length, which is then
xor
ed with the first block, the attack is a little bit more complex.
The first byte (in little-endian systems) needs to be xor
ed with the xor
of input lengths.
For example, if the input is ""
(empty string), and we append a nul byte "\x00"
,
since the length is 1, the first byte will be xor
ed to "\x01"
.
If we use the value 0x01
for the first byte, the produced hash of the string
"\x01"
will be the same as the hash of the string ""
.
Example of collisions:
message: "", hash: [41148b628ab49fce6d55977058831078]
message: "\x01", hash: [41148b628ab49fce6d55977058831078]
message: "A", hash: [86dd41d6a69d8f979dc9c5ca6fc27cce]
message: "B\x00", hash: [86dd41d6a69d8f979dc9c5ca6fc27cce]
message: "BAAAAAAAAAAAAAAA", hash: [32b05cfcaa74f585fc6c4fd7cef4a99]
message: "CAAAAAAAAAAAAAAA\x00", hash: [32b05cfcaa74f585fc6c4fd7cef4a99]
Mitigating Padding Attack
One of the mitigations for the padding attack is to initialize padding to some value dependending on overhang length. If done correctly, no inputs of different length will produce the same padded block. A computationally inexpensive solution would be to set padding bytes to overhang length. This only works if the overhang chunk is always calculated.
Example code:
// Read the last block
char Temp[16];
__movsb((unsigned char *)Temp, At, Overhang);
__m128i In = _mm_loadu_si128((__m128i *)Temp);
// Compute padding depending on overhang length to prevent padding attacks
__m128i Mask = _mm_loadu_si128((__m128i *)(OverhangMask + 16 - Overhang));
// _mm_blendv_epi8 requires SSE4, so we use `andnot` + `and` instead
// In = _mm_blendv_epi8(Mask, In, _mm_set1_epi8((char)Overhang));
__m128i Padding = _mm_andnot_si128(Mask, _mm_set1_epi8((char)Overhang));
In = _mm_or_si128(_mm_and_si128(Mask, In), Padding);
// At this point, In has been padded securely.
Before:
Input: [] -> Padded [00 00 00 00...]
Input: [00] -> Padded [00 00 00 00...]
After:
Input: [] -> Padded [00 00 00 00...]
Input: [00] -> Padded [00 01 01 01...]
Flaw 2: Invertible hashing operations
AES encryption is symmetric encryption. Which means that encryption and decryption functions are invertible, given that you know the encryption key. Since the key used for bit mixing operations is constant (all zeros), the bit mixing operations are easily invertible.
NextHash = AesDec4Times(PreviousHash ^ Chunk, 0)
Given AesDec4Times
is invertible,
InvAesDec4Times(NextHash, 0) = PreviousHash ^ Chunk
.Chunk = PreviousHash ^ InvAesDec4Times(NextHash, 0)
.PreviousHash = Chunk ^ InvAesDec4Times(NextHash, 0)
.
This equation allows us to choose two out of three variables arbitrarily.
Given fixed PreviousHash
and NextHash
, we can calculate a Chunk
that will
produce NextHash
after running one round of ComputeGlyphHash
.
Also, given NextHash
and Chunk
, we can calculate the PreviousHash
value.
This is pretty bad news for this hash function, since it allows us to:
- Invert a round of hashing, thus calculating the exact value for one input hashes.
Given
Hash
, for one chunk we know thatInputLength ^ DefaultSeed ^ Chunk = InvAesDec4Times(Hash)
, thusInputLength ^ Chunk = InvAesDec4Times(Hash) ^ DefaultSeed
. - Prepend a block to change the hash to an arbitrary value.
- Insert a block to change the hash to an arbitrary value.
- Append a block to change the hash to an arbitrary value.
Actually Inverting _mm_aesdec_si128(Hash, Zero)
The hass function is using AES primitives in a non-standard way, so some work is required to invert the operation.
Let’s look at the operation of four AES-NI intrinsics: _mm_aesdec_si128, _mm_aesdeclast_si128, _mm_aesenc_si128, _mm_aesenclast_si128:
_mm_aesdec_si128
a[127:0] := InvShiftRows(a[127:0])
a[127:0] := InvSubBytes(a[127:0])
a[127:0] := InvMixColumns(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]
_mm_aesdeclast_si128
a[127:0] := InvShiftRows(a[127:0])
a[127:0] := InvSubBytes(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]
_mm_aesenc_si128
a[127:0] := ShiftRows(a[127:0])
a[127:0] := SubBytes(a[127:0])
a[127:0] := MixColumns(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]
_mm_aesenclast_si128
a[127:0] := ShiftRows(a[127:0])
a[127:0] := SubBytes(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]
Since RoundKey
is always zero in our case, the xor
operations may be omitted below.
To invert _mm_aesdec_si128
, we need to run it in reverse, with the opposite operations.
InvShiftRows -> InvSubBytes -> InvMixColumns
is inverted to: MixColumns -> SubBytes -> ShiftRows
.
Initially, InvAesDec
was implemented using primitives from Tiny AES in C,
__m128i InvAesDec(__m128i data, __m128i key) {
// Implemented in terms of aes primitives, using tiny-aes
data = _mm_xor_si128(data, key);
MixColumns((state_t*)&data);
SubBytes((state_t*)&data);
ShiftRows((state_t*)&data);
return data;
}
Furthermore, Discord user @davee
has helpfully pointed out that the same operation
can be implemented using AES-NI intrinsics the following way:
__m128i InvAesDec(__m128i data, __m128i key) {
// Implemented using AES-NI intrinsics, credit to @davee
data = _mm_xor_si128(data, key);
__m128i zero = _mm_setzero_si128();
data = _mm_aesenc_si128(_mm_aesdeclast_si128(data, zero), zero);
return _mm_aesenclast_si128(data, zero);
}
Why does this work? Let’s write out the operations:
// aesdec(data, key)
// the operation we would like to invert
InvShiftRows, InvSubBytes, InvMixColumns, XOR Key,
// xor key
XOR Key,
// aesdeclast(data, zero)
InvShiftRows, InvSubBytes, XOR zero,
// aesenc(data, zero)
ShiftRows, SubBytes, MixColumns, XOR zero,
// aesenclast(daat)
ShiftRows, SubBytes, XOR zero,
All XOR zero
operations are no-ops, and can be omitted.
Two XOR Key
operations are next to each other, so they cancel each other.
So we are left with these ten:
InvShiftRows, InvSubBytes, InvMixColumns,
InvShiftRows, InvSubBytes,
ShiftRows, SubBytes, MixColumns,
ShiftRows, SubBytes,
Now, we need to delve a little bit about the inner workings of AES. We don’t need to know much, but these few things:
InvX
operations invertX
operations.ShiftRows
andInvShiftRows
operations swap bytes around.SubBytes
andInvSubBytes
substitute bytes.- For the current purpose,
MixColumns
is viewed as a black box that doesn’t commute with anything else.
Because of what (Inf)ShiftRows
. and (Inv)SubBytes
do, they commute.
Byte substitution is position-independent, and byte swapping is independent of contents.
Let’s rewrite the operations by swapping consecutive InvShiftRows
and InvSubBytes
:
InvSubBytes(5), InvShiftRows(4), InvMixColumns(3),
InvSubBytes(2), InvShiftRows(1),
ShiftRows(1), SubBytes(2), MixColumns(3),
ShiftRows(4), SubBytes(5),
Now we can see that the inverse operations are paired up, so they all cancel out.
Therefore, InvAesDec(AesDec(data, key), key)
is a no-oop. QED
Since we’re interested in inverting four rounds of AesEnc
, rather than
calling InvAesDec
four times, we can optimize this operation.
Note: the xor
operations can be omitted in this case since the key is zero.
__m128i InvAesDecX4(__m128i data, __m128i key) {
__m128i zero = _mm_setzero_si128();
data = _mm_xor_si128(data, key);
data = _mm_aesdeclast_si128(data, zero);
data = _mm_aesenc_si128(data, zero);
data = _mm_xor_si128(data, key);
data = _mm_aesenc_si128(data, zero);
data = _mm_xor_si128(data, key);
data = _mm_aesenc_si128(data, zero);
data = _mm_xor_si128(data, key);
data = _mm_aesenc_si128(data, zero);
return _mm_aesenclast_si128(data, zero);
}
This function allows us to perform all of the attacks mentioned before.
Demonstrating these attacks
Demonstrating invert attack, invert a hash up to 15 bytes
Note: due to padding attack, there are actually more messages
plaintext: [51, 77, 65, 72, 74, 79, 31, 32, 33]
hash: [8, 6e, c4, db, 1f, d1, 5b, b7, 95, d5, 44, ea, 5, 2c, bd, 4b]
recovered: [51, 77, 65, 72, 74, 79, 31, 32, 33]
hash: [8, 6e, c4, db, 1f, d1, 5b, b7, 95, d5, 44, ea, 5, 2c, bd, 4b]
Demonstrating prefix attack, prepend a block and avoid affecting the hash
message: [68, 65, 6c, 6c, 6f]
hash: [c1, 1e, 0, 16, 4a, f5, 67, 27, e, c1, 1c, ea, 39, 74, e6, 4b]
prefix: [1d, a4, 56, e6, e9, c3, 64, 70, 5c, e2, 8e, 11, ab, 23, a7, 9c]
forgery: [1d, a4, 56, e6, e9, c3, 64, 70, 5c, e2, 8e, 11, ab, 23, a7, 9c, 68, 65, 6c, 6c, 6f]
hash: [c1, 1e, 0, 16, 4a, f5, 67, 27, e, c1, 1c, ea, 39, 74, e6, 4b]
Demonstrating chosen prefix attack, append a block and produce arbitrary hash
prefix: [68, 65, 6c, 6c, 6f]
forgery: [68, 65, 6c, 6c, 6f, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, be, c2, 21, f0, 83, 9d, a6, 0, 61, 58, 2c, ac, f7, 33, 74, b2]
hash: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Demonstrating preimage attack, prepend a block and produce arbitrary hash
suffix: [68, 65, 6c, 6c, 6f]
goal hash: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
prefix: [f4, 47, d8, c8, 25, 21, 17, 41, 91, 82, 83, 31, ca, 60, 37, 4b]
message: [f4, 47, d8, c8, 25, 21, 17, 41, 91, 82, 83, 31, ca, 60, 37, 4b, 68, 65, 6c, 6c, 6f]
hash: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
The code for these attacks is available at github.com/m1el/refterm-hash-break.
Mitigating invertible hash operations
To mitigate these attacks, we take a look at publicly available construction of one-way functions.
The hash function in refterm is most similar to Davies-Meyer compression method,
given a cryptographically secure symmetrical encryption function Encrypt
, produces a secure one-way function.
It can be expressed in FP terms the following way:
ComputeHash(Input) = SecurePad(Input, 16).chunks_exact(16)
.fold(Zero, |Hash, Chunk| Hash ^ Encrypt(Hash, Chunk))
The implementation of this construction for refterm bit mixing operation would look like this:
__m128i PreviousHash = HashValue;
HashValue = _mm_aesdec_si128(HashValue, In);
HashValue = _mm_aesdec_si128(HashValue, In);
HashValue = _mm_aesdec_si128(HashValue, In);
HashValue = _mm_aesdec_si128(HashValue, In);
HashValue = _mm_xor_si128(HashValue, PreviousHash);
Please note that this is using AesDec4Times
as Encrypt
, and it is NOT cryptographically secure.
This is not a proper use of AES primitives, I suspect there may exist attacks on this hash function which I’m currently not aware of.
However, this provides significantly more security than previously existing hash function in refterm.
Conclusions
Multiple weaknesses were found in refterm’s ComputeGlyphHash
, with practical demonstration of attacks.
This demonstrates that ComputeGlyphHash
is not a cryptographically secure function.
Well known mitigations (proper padding, Davies-Meyer construction) were implemented and published. The mitigations produced negligible performance losses.
These attacks and mitigations could have been found and implemented by anyone with cursory understanding of cryptography.
Do not roll your own crypto if you don’t know what you’re doing.
Do not claim that you have built a strong hash function if you didn’t pass it through nine circles of peer review hell.
The attack code is available here: github.com/m1el/refterm-hash-break.
The mitigations are available here: github.com/cmuratori/refterm/pull/40.