The advanced encryption standard (AES) is a symmetric block chipher that encrypts data in blocks of 16 bytes regarless of key length. Regarded secure enough for regular applications with no publicly known practical attack. The cipher is symmetric so the same key can be used for both encryption and decryption. The output looks like random noise. Often the OKM from a key derrivation (KDF) function is used as the cipher-key. AES is featured inside Radikant-Crypto-C.
AES is a deterministic state machine transforming the input-message using an cipher-key during several rounds into a cipher-message. The cipher-key can have sizes of 128,192 or 256 bits. The cipher-key is expanded into a KeySchedule that consists of sliced smaller 128 bit RoundKeys. The first RoundKey is XOR’ed with a plaintext before any rounds start. During each of these rounds (10,12 or 14) depending on the cipher-key size, the data undergoes 4 distinct steps: SubBytes, ShiftRows, MixColumns, and AddRoundKey.
In the last round MixColumns is omitted because its purpose is to spread information into future rounds of SubBytes. Since no further rounds exist, the designers state it does not contribute to security in any meaningfullway and left it out in the final step.
| AES | [W] | KeySchedule | bits | ||||||||
| AES128 | 4 | W₀ | W₁ | W₂ | W₃ | exp | exp | exp | exp | exp | 1408 |
| AES192 | 6 | W₀ | W₁ | W₂ | W₃ | W₄ | W₅ | exp | exp | exp | 1664 |
| AES256 | 8 | W₀ | W₁ | W₂ | W₃ | W₄ | W₅ | W₆ | W₇ | exp | 1920 |
The cipher key is expanded into a larger byte array called the KeySchedule. The total KeySchedule size depends on the amount of rounds. The amount of rounds depends on the cipher-key size. The Keyschedule consists of multiple 128 bit RoundKeys and are always 128 bit. Before any AES round start the first RoundKey from the keyschedule is XOR’ed with the plaintext. Therefore one additional RoundKey is generated to match the RoundKeys consumption during rounds. The remaining RoundKeys are all used during each cycle.
Rounds
‣ AES128 - 10 rounds ‣ KeySchedule = 1408 bits ‣ 11 RoundKeys ‣ 4 KeyWords¹
‣ AES192 - 12 rounds ‣ KeySchedule = 1664 bits ‣ 13 RoundKeys ‣ 6 KeyWords¹
‣ AES256 - 14 rounds ‣ KeySchedule = 1920 bits ‣ 15 RoundKeys ‣ 8 KeyWords¹
Key expansion
The cipher key is always fully visible in the first bytes of the keyschedule and is used as seed to expand the key to the desired length. The key expansion is performed by splitting the cipher-key into 4 byte chunks called word, for example a 128 bit¹ cipher-key has 4 words: [W₀] [W₁] [W₂] [W₃]. Then a chain of XOR(⊕) operations and a G-Function every 4/6/8 words depending on the keylngth expands the key by generating new words:
W₄ = W₀ ⊕ G(W₃) G-Function
W₅ = W₁ ⊕ W₄
W₆ = W₂ ⊕ W₅
W₇ = W₃ ⊕ W₆
• The G-Function applied to every new batch¹ of keywords: for 128bit¹ (4 KeyWords) (W₄ , W₈ , W₁₂ etc.. )
• AES256 adds an additional SubBytes step is applied every 4 KeyWords before XOR.
This chain reaction continues until desired length for the keyschedule is reached e.g. 1920 bits for AES256.
Inital AddRoundKey
‣ AES128 - 128 bits | so the entire cipher key is XORed with the plaintext
‣ AES192 - 192 bits | only the first 128 bits (W₀ W₁ W₂ W₃) are XORed with the plaintext.
‣ AES256 - 256 bits | only the first half (W₀ W₁ W₂ W₃) is XORed with the plaintext.
The remaining KeySchedule is sliced in 128 bit RoundKeys that are consumed at the final step AddRoundKey during a round. Since the RoundKey is one slice of the Key Schedule and is always 128 bits and the 1st key is mixed with the plain text message before the rounds start, the formula is (rounds+1) x 128 = KeySchedule size.
The G function transformation is used in the key expansion(KeySchedule) to introduce unpredictability to the round keys derived from the cipher key in three sequential operations:
1. RotWord
Shifts 4 bytes left by one position
[a0, a1, a2, a3] → [a1, a2, a3, a0]
2. SubBytes
Each byte is replaced using the S-Box based on GF(2⁸) inversion):
3. Rcon
XOR with Round Constant: {0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36}
1. SubBytes
SubBytes tranforms the input using a Substitution Box (S-box) and substitutes every single byte of the input for a different byte from the (S-Box). Resulting in a non-linear relation between the input and the output.
2. ShiftRows
Organizes a 16-byte block into a [4x4] matrix/grid with rows and columns. In this step, it shifts rows to the left. First Row is untouched, the second row shifts (one), the third (two), and the fourth three (3). This ensures bytes are spread out over different columns than they started in, achieving diffusion.
3. MixColumns
In mix columns there is an AES constant [4x4] matrix, which is multiplied with the [4x1] columns of the [4x4] matrix from ShiftRows operation. Multiplying a [4x4] matrix with a [4x1] matrix results in a new [4x1] column matrix. This blends values within a column, since if you change 1 byte from the input matrix it changes all 4 bytes of the output column matrix. The 4 operations do not depend on each other and can be executed in parallel. In the last round this step is skipped. Matrix multiplication is performed in GF(2⁸). This ensures the result of multiplication is still a valid byte and reversible for decryption.
4. AddRoundKey
Finally, the algorithm takes a round key(128 bits) which is part of the expanded cipher-key from the Key KeySchedule and XOR'ed is with the 16 bytes. This locks the scrambled data to the specific cipher-key.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 63 | 7c | 77 | 7b | f2 | 6b | 6f | c5 | 30 | 01 | 67 | 2b | fe | d7 | ab | 76 |
| 1 | ca | 82 | c9 | 7d | fa | 59 | 47 | f0 | ad | d4 | a2 | af | 9c | a4 | 72 | c0 |
| 2 | b7 | fd | 93 | 26 | 36 | 3f | f7 | cc | 34 | a5 | e5 | f1 | 71 | d8 | 31 | 15 |
| 3 | 04 | c7 | 23 | c3 | 18 | 96 | 05 | 9a | 07 | 12 | 80 | e2 | eb | 27 | b2 | 75 |
| 4 | 09 | 83 | 2c | 1a | 1b | 6e | 5a | a0 | 52 | 3b | d6 | b3 | 29 | e3 | 2f | 84 |
| 5 | 53 | d1 | 00 | ed | 20 | fc | b1 | 5b | 6a | cb | be | 39 | 4a | 4c | 58 | cf |
| 6 | d0 | ef | aa | fb | 43 | 4d | 33 | 85 | 45 | f9 | 02 | 7f | 50 | 3c | 9f | a8 |
| 7 | 51 | a3 | 40 | 8f | 92 | 9d | 38 | f5 | bc | b6 | da | 21 | 10 | ff | f3 | d2 |
| 8 | cd | 0c | 13 | ec | 5f | 97 | 44 | 17 | c4 | a7 | 7e | 3d | 64 | 5d | 19 | 73 |
| 9 | 60 | 81 | 4f | dc | 22 | 2a | 90 | 88 | 46 | ee | b8 | 14 | de | 5e | 0b | db |
| A | e0 | 32 | 3a | 0a | 49 | 06 | 24 | 5c | c2 | d3 | ac | 62 | 91 | 95 | e4 | 79 |
| B | e7 | c8 | 37 | 6d | 8d | d5 | 4e | a9 | 6c | 56 | f4 | ea | 65 | 7a | ae | 08 |
| C | ba | 78 | 25 | 2e | 1c | a6 | b4 | c6 | e8 | dd | 74 | 1f | 4b | bd | 8b | 8a |
| D | 70 | 3e | b5 | 66 | 48 | 03 | f6 | 0e | 61 | 35 | 57 | b9 | 86 | c1 | 1d | 9e |
| E | e1 | f8 | 98 | 11 | 69 | d9 | 8e | 94 | 9b | 1e | 87 | e9 | ce | 55 | 28 | df |
| F | 8c | a1 | 89 | 0d | bf | e6 | 42 | 68 | 41 | 99 | 2d | 0f | b0 | 54 | bb | 16 |
The S-Box or subsitution box is a 256 byte array that subsitutes a byte for another value. A byte has 256 possible values therefore there are 256 subsitutes in the S-Box. You can use the above table as a lookup table 1st hex digit = Row and 2nd hex digit = Column
Examples
0xFF = 0x16
0x79 = 0xb6
0xb3 = 0x6d
The S-Box and the Inverse S-Box are functional inverses. This means that if you apply the S-Box to a byte and then apply the Inverse S-Box to the result, you get your original byte back.
Electronic Codebook (ECB) takes a array of bytes of any length (multiple of 16 bytes) and encrypts each block of 16 bytes independently. The gives some advantages since every block will look like random noise and it can be executed in parallel. However since AES is deterministic every input block will have the same output. Data like json’s, text and other structured information is highly repetetive this will lead inevitabile to repeating patterns in the encrypted output and thus leaks information. For this reason ECB is considered insecure on its own.
Cipher Block Chaining (CBC) In this mode, every single block of data is tangled up with preceding block. Because the very first block doesn't have a "previous block" to chain to, the computer generates a random 16-byte number called the Initialization Vector(IV) and xor’s the 1st block of plaintext with the IV. The IV serves as a seed block to kickstart the process and is usualy send in plain text to the other party.
The intermediate result of the IV and the plain-text-1 is IMR-1 and is encrypted with the cipher-key using AES producing cipher-text-1. Cipher-text-1 can now be XOR’ed with plain-text-2 to produce IMR-2 which can now be encrypted with the cipher-key to produce cipher-text-2. Information ripples trough the chain, so unlike ECB mode, a change in plain-text-1 affects the whole chain.
CBC encryption is sequential and therefore cannot be executed in parralel because the next block always depends on the preceding block and therefore slow. Decryption however can be parralalized. Because with the cipher-key the IMR-5 can be de decrypted. Now since we we know cipher-text-4 ar just do an inverse XOR operation
In counter mode, a 16-byte input(counter) block is generated by concatenating a nonce and a counter. This block is encrypted using AES with the cipher-key. to produce a pseudorandom KeyStream block. The keystream is then XORed with the plaintext to produce the ciphertext. The counter is incremented for each block. Because blocks do not chain together, the process can be parallelized across multiple CPU cores. The nonce is send in plain text but may never be reused with a particular cipher-key. Furthermore the counter may never be reset. If the counter is reset (session) a new nonce should be generated.
The counter is incremented on every block generating a new KeyStream. If an attacker knows the block data in advance (Known-Plaintext Attack) and the data is for example “isadmin: false” he could simply invert the final XOR operation to deduce the KeyStream of that particular block and could then alter the data to “isadmin: true” with a bit flipping attack without the reciever knowing the data was tampered with. These attacks pose a security risk especially in communication protocols which follow an easily predictable pattern such as “hello client”.
Counter with CBC-MAC mode prevents bit-flipping attacks. CTR is not tamper proof an is vunurable to flip bits attacks without the receiever being able to detect the message was altered. CBC is used to generate a unique fingerprint associated with the plain-text that can undeniably proof that the encrypted data tampered. The CTR always starts counting from 1 instead from 0. Since 0 is reserved to generate KeyStream 0 which is later required to generate the TAG.
In CBC mode all preceding blocks affect the current and future blocks to be encrypted. This chaining makes sure if one bit is flipped in the plain text it will result in a completely different tag. All ecnrypted blocks are disregarded and only the last block is used as MAC. Finally the MAC is then XOR’ed with the KeyStream0 to generate a TAG.
CTR mode is fully parralelizable for both encryption and decrypuion. However the TAG generation cannot be parralalized because in CBC all blocks must be proccesed in serial.
Galois/Counter Mode AES-GCM (Galois/Counter Mode) is the reigning champion of Authenticated Encryption (AEAD). While older authenticated modes like CCM forced the computer to pass data through the AES engine twice, GCM optimizes the process. It encrypts the payload using fully parallelizable CTR mode, while simultaneously generating a Message Authentication Code (MAC) using high-speed Galois Field multiplication. This "one-pass" architecture allows modern multi-core processors to encrypt and authenticate massive amounts of data with virtually zero delay.