HMAC vs. raw SHA-1

okay- maybe this shouldn't have taken me quite so long to understand, but I've been a little bit confused about the differences between SHA-1 and HMAC.

HMAC employs a cryptographic hashing function (ex. SHA-1) but it wasn't clear to me why the cryptographic hashing function itself wasn't "good enough" -- why couldn't HMAC just be SHA-1.

SHA-1 generates a fixed size output of 20-bytes for an arbitrarily long message; but so does an HMAC when it uses SHA-1. So what's the difference?

Turns out the answer is actually relatively straightforward.

For sake of explanation, assume that you want to declare your undying love to someone you've been dating. You'd love to come up with a beautiful sonnet, but in the end you decide that simply saying "i love you" is enough.

You want the message to arrive intact and unaltered, but you don't care if the contents of message itself are known to the world. Knowing a little about cryptographic hashes: you generate a digest from your message using SHA-1.

That message results in: 'bb7b1901d99e8b26bb91d2debdb7d7f24b3158cf'.

On receipt of the message, your would-be-love recomputes the SHA-1 from the message, compares the computed digest to the sent digest. They match and all seems well.

A sinister rival however has other plans. They intercept your message, and replace the message with another "don't call me anymore", they then generate a brand new digest: 'e267e18f05cb6ea3b10b761bbac21a0f92bb8d0d' and replace your original digest. On receipt your love reads the message in disbelief; quickly calculating the hash to make sure the message hasn't been altered. But the hash itself has been changed so the altered hash matches altered message and chaos ensues.

Things look grim, but you explain to your would-be-love what's happened, and they decide to give you another chance. So that this doesn't happen again you decide to tell your lover from now on, whenever they get a message from you, before computing the hash prepend the text "our secret key.", and you will do the same.

This time that same message generates the digest '8a2c1bfa977478f73dbfab8508bc09360b20b569'

Simply replacing the digest doesn't work anymore. If naive attacker still attempts to use the 'e267e18f...' digest your lover would see that the key + the message doesn't compute. You don't send the key in the message itself, and no one knows your secret key so no one can generate a fake message.

There is however a problem still, and the problem is the reason for the difference between SHA-1 and HMAC.

SHA-1 uses an iterative algorithm. It generates digests by first splitting a message into blocks of 64 bytes and, one after the other, combining those blocks together to generate the 20 byte digest. But, since your message can be of any length, and since SHA by its iterative nature works by computing block after block of 64 bytes there is a problem.

Your rival trying once again to subvert your message could just tack additional data onto your message, and this time use the digest in your message as the seed to generate their own new digest of your message. They don't need your secret key because the key was already embedded the blocks that you built. They can't alter what you've written, but they can add more. Your lack of punctuation has in fact made this even easier.

By simply adding
"but please don't call me anymore" and updating the digest to '725fbcbd1e94d03c2e54b01da3944c6385d17e4d' your love will think the entire message is from you even though only the first part was -- and doubly so because of the secret key.

Good bye romance.

An HMAC fixes this.

The algorithm adds one more layer: essentially it takes the hash of your key + message, prepends the key to that hash, and then re-hashes the result. I say essentially because it actually does one other thing to make things more cryptographically sound. HMAC masks your key during the first -- inner -- hash with a fixed constant. Then on the second -- outer -- hash it masks your key again with a different fixed constant. The masking operations result in a different inner and outer key value, and the entire process effectively seals your message, hides your key, and makes it impossible to tack new data on the end.

According to wikipedia no known message extension attacks have ever been found.

Good luck romance.

12 comments:

Simo Salminen said...

Thanks for the explanation.

Brian House said...

exactly what I was looking for, thank you.

Unknown said...

Good explanation, but I have a question about it.
SHA-1 uses padding that incorporates the length of the original message.
Suppose the original length of the message is 10 bytes and the modified one is 15 bytes.
When the hash is calculated, the first block will be different at the sender and the receiver side (since the messages have a different length).
How can this result in the same hash at both sides?

ionous said...

you're right that the hash includes includes the length. my explanation definitely glosses over the way the actual extension attack works.

many hashing algorithms, SHA included, consist of two discrete stages.

in the first stage, the algorithm splits the message into fixed size blocks. as it does so, it pads the message to make sure the final block is of the right size. and, as part of that padding, it incorporates the length.

in the second stage, the algorithm generates the actual hash by digesting those fixed size blocks. the math for SHA is honestly above my head, but the key here is that it digests those blocks without regard for what was content, padding, or length.

for the extension attack then, when you extend the message you actually have to first incorporate the length and padding into the message content, restart the hashing function using the value of the hash itself - and *then* tack on your new message.

for a plain-text message you'd generally be able to notice alteration b/c, for instance, the example message would actually look like: i love you (padding data+original length) but please don't call me anymore.

further, it does mean you have to know what the original length of the message (key+message text) was in order to be add that data to the message. hard as that sounds, you actually might be able to guess the key length by looking at multiple messages, or, in some cases, it might have a well known, predefined length.

as a great real world example, here's a more in depth explanation of the attack and how it was used against flickr.

Unknown said...

Awesome post.

The Student said...

Excellent explanation!!

ionous said...

Thanks for saying you liked it! Much appreciated!

Unknown said...

In Network Security and Cyptography, these relative terms are hard to borderline and so things might get confusing.

Thank you for crystal clear explanation.

Jackmicro said...

Very useful. Thanks for your explanation.

Unknown said...

Thanks for this romantic explanation, very useful. Cheers!!

flz said...

How is "8a2c1bfa977478f73dbfab8508bc09360b20b569" calculated?
SHA1(our secret key.i love you)=E0759E9B59BDD6D864D29CE3A502ADB6257F7615

JoeSchmoe said...

flz, i think you're right, i get the same as you do.

but whatever, that's not really the point and thanks for the explanation, O.P. Dude :)