Cryptography 101 for Software Engineers
The things you need to know about cryptography as a Software Engineer who doesn't have time for lots of weird math.
This week, we’re going to talk about cryptography. It’s an easy thing to get wrong, but knowing the basics can be helpful. But first, I’ll start with a disclaimer that I’m not a cryptographer, just an engineer that has worked on cryptography libraries being used in large production systems.
In the real world, these are the use cases for cryptography you’re likely to come across.
Encryption
Encryption is conversion of some (generally sensitive) information, the “plaintext”, to an alternative form known as “ciphertext”, with the help of a pseudo-random secret key. With good encryption algorithms (or ciphers), it is computationally hard to get back to the plaintext from the ciphertext without having access to the key.
Encryption could be authenticated or unauthenticated. An authenticated cipher adds some authentication data to the ciphertext. This authentication data is later used by the cipher, during decryption, to recognize ciphertexts not generated by itself. This protects against an attacker trying a bunch of random ciphertexts against your system to try to find patterns and guess the secret key.
Encryption can also be deterministic or non-deterministic. A deterministic cipher always returns the same ciphertext for a given plaintext, while a non-deterministic cipher will return different ciphertexts for a plaintext when called multiple times.
For most cases, you should probably be using a cipher that’s authenticated and non-deterministic. Google recommends AES-GCM for most encryption purposes.
Message Authentication Codes (MAC)
A MAC is metadata added to a message that gives the reader of the message a strong guarantee that the message was sent by a known sender and has not been altered by an attacker. A very common place where you come across MACs is webhooks. It’s common for webhook calls to be accompanied with MACs to ensure that the endpoint can verify the identity of the caller.
The most common MAC algorithm I’ve seen in the wild is HMAC-SHA256, although Google recommends HMAC-SHA512 for most use cases. With MAC, especially in the context of webhooks, sometimes the choice of algorithm is made by the caller you’re operating the webhook for.
Hashes
A hash is an algorithm that maps arbitrary size data to a fixed size array of bytes. An ideal hash has no collisions i.e. no two inputs lead to the same output. A hash is also one-way meaning it’s not possible to go from the output to the input.
Hashes are used for many things. Passwords are hashed1 before being saved in the database. Hashes are used to give unique identifiers to many things, like git commits, files etc.
Things to consider to keep things secure
If you’re working on something cryptography related, you should keep a few things in mind.
Don’t write your own ciphers. There are too many things to consider and you will likely miss something if you’re not an expert. It’s best to use vetted libraries instead of rolling your own crypto.
Choose a secure default algorithm for the common usecases and enforce that it gets used across the codebase with limited exceptions. People have the tendency to cargo-cult these things and if your codebase has a good default algorithm being used already, it’ll probably get used instead of some unsafe alternative.
Think about key rotation from the beginning. At some point, your keys will leak or get logged by a developer. It should be operationally easy to roll your keys. Google’s tink keysets are a good thing to use here.