HMAC vs. raw SHA-1

okay - maybe this shouldn’t have taken me quite so long to understand, but I’ve been a little bit confused about the differences between SHA-1 and HMAC.

HMAC employs a cryptographic hashing function (ex. SHA-1) but it wasn’t clear to me why the cryptographic hashing function itself wasn’t “good enough” – why couldn’t HMAC just be SHA-1.

SHA-1 generates a fixed size output of 20-bytes for an arbitrarily long message; but so does an HMAC when it uses SHA-1. So what’s the difference?

Turns out the answer is actually relatively straightforward.

For sake of explanation, assume that you want to declare your undying love to someone you’ve been dating. You’d love to come up with a beautiful sonnet, but in the end you decide that simply saying “i love you” is enough.

You want the message to arrive intact and unaltered, but you don’t care if the contents of message itself are known to the world. Knowing a little about cryptographic hashes: you generate a digest from your message using SHA-1. That message results in: bb7b1901d99e8b26bb91d2debdb7d7f24b3158cf.

On receipt of the message, your would-be-love recomputes the SHA-1 from the message, compares the computed digest to the sent digest. They match and all seems well.

A sinister rival however has other plans. They intercept your message, and replace the message with something completely different: “don’t call me anymore”. Then they generate a brand new digest, and replace yours: e267e18f05cb6ea3b10b761bbac21a0f92bb8d0d

On receipt, your love reads the message in disbelief; quickly calculating the hash to make sure the message hasn’t been altered. But the hash itself has been changed so the altered hash matches altered message and chaos ensues.

Things look grim, but you explain to your would-be-love what’s happened, and they decide to give you another chance. So that this doesn’t happen again you decide to tell your lover from now on, whenever they get a message from you, before computing the hash prepend the text “our secret key.”, and you will do the same.

This time that same message generates the digest 8a2c1bfa977478f73dbfab8508bc09360b20b569

Simply replacing the digest doesn’t work anymore. If naive attacker still attempts to use the e267e18f... digest your lover would see that the key + the message doesn’t compute. You don’t send the key in the message itself, and no one knows your secret key so no one can generate a fake message.

There is however a problem still, and the problem is the reason for the difference between SHA-1 and HMAC.

SHA-1 uses an iterative algorithm. It generates digests by first splitting a message into blocks of 64 bytes and, one after the other, combining those blocks together to generate the 20 byte digest. But, since your message can be of any length, and since SHA by its iterative nature works by computing block after block of 64 bytes there is a problem.

Your rival trying once again to subvert your message could just tack additional data onto your message, and this time use the digest in your message as the seed to generate their own new digest of your message. They don’t need your secret key because the key was already embedded the blocks that you built. They can’t alter what you’ve written, but they can add more. Your lack of punctuation has in fact made this even easier.

By simply adding “but please don’t call me anymore” and updating the digest to 725fbcbd1e94d03c2e54b01da3944c6385d17e4d your love will think the entire message is from you even though only the first part was – and doubly so because of the secret key.

Good bye romance.

But… an HMAC fixes this.

The algorithm adds one more layer: essentially it takes the hash of your key + message, prepends the key to that hash, and then re-hashes the result. I say essentially because it actually does one other thing to make things more cryptographically sound. HMAC masks your key during the first – inner – hash with a fixed constant. Then on the second – outer – hash it masks your key again with a different fixed constant. The masking operations result in a different inner and outer key value, and the entire process effectively seals your message, hides your key, and makes it impossible to tack new data on the end.

According to wikipedia no known message extension attacks have ever been found.

Good luck romance.

Comments: Link to heading

Thanks for the explanation. Link to heading

Simo Salminen - Mar 1, 2009

Thanks for the explanation.

exactly what I was looking for, thank you. Link to heading

Brian House - Feb 4, 2010

exactly what I was looking for, thank you.

Good explanation, but I have a question about it. … Link to heading

Unknown - Jun 1, 2010

Good explanation, but I have a question about it.
SHA-1 uses padding that incorporates the length of the original message.
Suppose the original length of the message is 10 bytes and the modified one is 15 bytes.
When the hash is calculated, the first block will be different at the sender and the receiver side (since the messages have a different length).
How can this result in the same hash at both sides?

you’re right that the hash includes includes t… Link to heading

ionous - Jun 4, 2010

you’re right that the hash includes includes the length. my explanation definitely glosses over the way the actual extension attack works.

many hashing algorithms, SHA included, consist of two discrete stages.

in the first stage, the algorithm splits the message into fixed size blocks. as it does so, it pads the message to make sure the final block is of the right size. and, as part of that padding, it incorporates the length.

in the second stage, the algorithm generates the actual hash by digesting those fixed size blocks. the math for SHA is honestly above my head, but the key here is that it digests those blocks without regard for what was content, padding, or length.

for the extension attack then, when you extend the message you actually have to first incorporate the length and padding into the message content, restart the hashing function using the value of the hash itself - and *then* tack on your new message.

for a plain-text message you’d generally be able to notice alteration b/c, for instance, the example message would actually look like: i love you (padding data+original length) but please don’t call me anymore.

further, it does mean you have to know what the original length of the message (key+message text) was in order to be add that data to the message. hard as that sounds, you actually might be able to guess the key length by looking at multiple messages, or, in some cases, it might have a well known, predefined length.

as a great real world example, here’s a more in depth explanation of the attack and how it was used against flickr.

Awesome post. Link to heading

Unknown - Sep 2, 2012

Awesome post.

Excellent explanation!! Link to heading

The Student - Feb 4, 2013

Excellent explanation!!

In Network Security and Cyptography, these relativ… Link to heading

Unknown - Sep 0, 2013

In Network Security and Cyptography, these relative terms are hard to borderline and so things might get confusing.

Thank you for crystal clear explanation.

Very useful. Thanks for your explanation. Link to heading

Jackmicro - Feb 1, 2014

Very useful. Thanks for your explanation.

Thanks for this romantic explanation, very useful…. Link to heading

Unknown - Feb 3, 2014

Thanks for this romantic explanation, very useful. Cheers!!

How is “8a2c1bfa977478f73dbfab8508bc09360b20b… Link to heading

flz - Apr 1, 2014

How is “8a2c1bfa977478f73dbfab8508bc09360b20b569” calculated?
SHA1(our secret key.i love you)=E0759E9B59BDD6D864D29CE3A502ADB6257F7615

flz, i think you’re right, i get the same as y… Link to heading

JoeSchmoe - May 5, 2014

flz, i think you’re right, i get the same as you do.

but whatever, that’s not really the point and thanks for the explanation, O.P. Dude :)