Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

AES Encryption Algorithm: Implementation and Key Agility, Study notes of Electrical and Electronics Engineering

An overview of the advanced encryption standard (aes) encryption algorithm, focusing on its implementation and key agility. The data structures, parameters, and functions involved in aes encryption, including substitution, shiftrows, and keyaddition. It also discusses the concept of key scheduling and its impact on encryption and decryption throughput and latency in hardware architectures.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-2he-2
koofers-user-2he-2 🇺🇸

10 documents

1 / 29

Toggle sidebar

Related documents


Partial preview of the text

Download AES Encryption Algorithm: Implementation and Key Agility and more Study notes Electrical and Electronics Engineering in PDF only on Docsity! 1 AES implementations in software and hardware ECE 746: Lecture 4 AES in Software Reference software implementation 2 Initial transformation Final transformation #rounds times Round Key[i] i:=i+1 Round Key[0] i:=1 i<#rounds? Cipher Round Round Key[#rounds+1] Iterative cipher KeyAddition(a,rk[0],BC); /* ROUNDS-1 ordinary rounds */ for(r = 1; r < ROUNDS; r++) { Substitution(a,S,BC); ShiftRow(a,0,BC); MixColumn(a,BC); KeyAddition(a,rk[r],BC); } /* Last round is special: there is no MixColumn */ Substitution(a,S,BC); ShiftRow(a,0,BC); KeyAddition(a,rk[ROUNDS],BC); Round function - encryption Data structures and parameters int rijndaelEncryptRound (word8 a[4][MAXBC], int keyBits, int blockBits, word8 rk[MAXROUNDS+1][4][MAXBC], int rounds) { int r, BC, ROUNDS; switch (blockBits) { case 128: BC = 4; break; case 192: BC = 6; break; case 256: BC = 8; break; default : return (-2); } switch (keyBits >= blockBits ? keyBits : blockBits) { case 128: ROUNDS = 10; break; case 192: ROUNDS = 12; break; case 256: ROUNDS = 14; break; default : return (-3); /* this cannot happen */ } 5 Reference implementation - SubBytes (2) word8 S[256] = { 99, 124, 119, 123, 242, 107, 111, 197, 48, 1, 103, 43, 254, 215, 171, 118, 202, 130, 201, 125, 250, 89, 71, 240, 173, 212, 162, 175, 156, 164, 114, 192, 183, 253, 147, 38, 54, 63, 247, 204, 52, 165, 229, 241, 113, 216, 49, 21, 4, 199, 35, 195, 24, 150, 5, 154, 7, 18, 128, 226, 235, 39, 178, 117, 9, 131, 44, 26, 27, 110, 90, 160, 82, 59, 214, 179, 41, 227, 47, 132, 83, 209, 0, 237, 32, 252, 177, 91, 106, 203, 190, 57, 74, 76, 88, 207, 208, 239, 170, 251, 67, 77, 51, 133, 69, 249, 2, 127, 80, 60, 159, 168, 81, 163, 64, 143, 146, 157, 56, 245, 188, 182, 218, 33, 16, 255, 243, 210, 205, 12, 19, 236, 95, 151, 68, 23, 196, 167, 126, 61, 100, 93, 25, 115, 96, 129, 79, 220, 34, 42, 144, 136, 70, 238, 184, 20, 222, 94, 11, 219, 224, 50, 58, 10, 73, 6, 36, 92, 194, 211, 172, 98, 145, 149, 228, 121, 231, 200, 55, 109, 141, 213, 78, 169, 108, 86, 244, 234, 101, 122, 174, 8, 186, 120, 37, 46, 28, 166, 180, 198, 232, 221, 116, 31, 75, 189, 139, 138, 112, 62, 181, 102, 72, 3, 246, 14, 97, 53, 87, 185, 134, 193, 29, 158, 225, 248, 152, 17, 105, 217, 142, 148, 155, 30, 135, 233, 206, 85, 40, 223, 140, 161, 137, 13, 191, 230, 66, 104, 65, 153, 45, 15, 176, 84, 187, 22, }; Reference implementation - InvSubBytes (2) word8 Si[256] = { 82, 9, 106, 213, 48, 54, 165, 56, 191, 64, 163, 158, 129, 243, 215, 251, 124, 227, 57, 130, 155, 47, 255, 135, 52, 142, 67, 68, 196, 222, 233, 203, 84, 123, 148, 50, 166, 194, 35, 61, 238, 76, 149, 11, 66, 250, 195, 78, 8, 46, 161, 102, 40, 217, 36, 178, 118, 91, 162, 73, 109, 139, 209, 37, 114, 248, 246, 100, 134, 104, 152, 22, 212, 164, 92, 204, 93, 101, 182, 146, 108, 112, 72, 80, 253, 237, 185, 218, 94, 21, 70, 87, 167, 141, 157, 132, 144, 216, 171, 0, 140, 188, 211, 10, 247, 228, 88, 5, 184, 179, 69, 6, 208, 44, 30, 143, 202, 63, 15, 2, 193, 175, 189, 3, 1, 19, 138, 107, 58, 145, 17, 65, 79, 103, 220, 234, 151, 242, 207, 206, 240, 180, 230, 115, 150, 172, 116, 34, 231, 173, 53, 133, 226, 249, 55, 232, 28, 117, 223, 110, 71, 241, 26, 113, 29, 41, 197, 137, 111, 183, 98, 14, 170, 24, 190, 27, 252, 86, 62, 75, 198, 210, 121, 32, 154, 219, 192, 254, 120, 205, 90, 244, 31, 221, 168, 51, 136, 7, 199, 49, 177, 18, 16, 89, 39, 128, 236, 95, 96, 81, 127, 169, 25, 181, 74, 13, 45, 229, 122, 159, 147, 201, 156, 239, 160, 224, 59, 77, 174, 42, 245, 176, 200, 235, 187, 60, 131, 83, 153, 97, 23, 43, 4, 126, 186, 119, 214, 38, 225, 105, 20, 99, 85, 33, 12, 125, }; The same function, different table a b c d e g h i j k l m n o p ShiftRows f a b c d g ef i jk l op m n h no shift cyclic shift left by C1=1 cyclic shift left by C2=2 cyclic shift left by C3=3 Block size C1 C2 C3 128 bits 192 bits 256 bits 1 2 3 1 2 3 1 3 4 6 a b c d f h e k l i j p m n o InvShiftRows g a b c d f he k li j pm n o g no shift cyclic shift left by C1’=3 cyclic shift left by C2’=2 cyclic shift by left C3’=1 Block size C1’ C2’ C3’ 128 bits 192 bits 256 bits 3 2 1 5 4 3 7 5 4 static word8 shifts[3][4][2] = { 0, 0, 1, 3, 2, 2, 3, 1, 0, 0, 1, 5, 2, 4, 3, 3, 0, 0, 1, 7, 3, 5, 4, 4 }; Reference implementation ShiftRows, InvShiftRows(1) Number of shifts during encryption Number of shifts during decryption for BC=4, block size=128 bits for BC=6, block size=192 bits for BC=8, block size=256 bits void ShiftRow(word8 a[4][MAXBC], word8 d, word8 BC) { /* Row 0 remains unchanged * The other three rows are shifted a variable amount */ word8 tmp[MAXBC]; int i, j; for(i = 1; i < 4; i++) { for(j = 0; j < BC; j++) tmp[j] = a[i][(j + shifts[SC][i][d]) % BC]; for(j = 0; j < BC; j++) a[i][j] = tmp[j]; } } Reference implementation ShiftRows, InvShiftRows(2) 7 MixColumns a0,0 a0,1 a0,2 a0,3 a1,0 a1,1 a1,2 a1,3 a2,0 a2,1 a2,2 a2,3 a3,0 a3,1 a3,2 a3,3 b0,0 b0,1 a0,2 b0,3 b1,0 b1,1 a1,2 b1,3 b2,0 b2,1 a2,2 b2,3 b3,0 b3,1 a3,2 b3,3 a1,j 0,j a2,j a3,j b1,j b0,j b2,j b3,j 2 3 1 1 1 2 3 1 1 1 2 3 3 1 1 2 void MixColumn(word8 a[4][MAXBC], word8 BC) { /* Mix the four bytes of every column in a linear way */ word8 b[4][MAXBC]; int i, j; for(j = 0; j < BC; j++) for(i = 0; i < 4; i++) b[i][j] = mul(2,a[i][j]) ^ mul(3,a[(i + 1) % 4][j]) ^ a[(i + 2) % 4][j] ^ a[(i + 3) % 4][j]; for(i = 0; i < 4; i++) for(j = 0; j < BC; j++) a[i][j] = b[i][j]; } Reference implementation MixColumns (1) InvMixColumns a0,0 a0,1 a0,2 a0,3 a1,0 a1,1 a1,2 a1,3 a2,0 a2,1 a2,2 a2,3 a3,0 a3,1 a3,2 a3,3 b0,0 b0,1 a0,2 b0,3 b1,0 b1,1 a1,2 b1,3 b2,0 b2,1 a2,2 b2,3 b3,0 b3,1 a3,2 b3,3 a1,j 0,j a2,j a3,j b1,j b0,j b2,j b3,j E B D 9 9 E B D D 9 E B B D 9 E 10 a0,0 a0,1 a0,2 a0,3 a1,0 a1,1 a1,2 a1,3 a2,0 a2,1 a2,2 a2,3 a3,0 a3,1 a3,2 a3,3 b0,0 b0,1 b0,2 b0,3 b1,0 b1,1 b1,2 b1,3 b2,0 b2,1 b2,2 b2,3 b3,0 b3,1 b3,2 b3,3 AddRoundKey k0,0 k0,1 k0,2 k0,3 k1,0 k1,1 k1,2 k1,3 k2,0 k2,1 k2,2 k2,3 k3,0 k3,1 k3,2 k3,3 + = • simple bitwise addition (xor) of round keys Reference implementation AddRoundKey void KeyAddition(word8 a[4][MAXBC], word8 rk[4][MAXBC], word8 BC) { /* Exor corresponding text input and round key input bytes */ int i, j; for(i = 0; i < 4; i++) for(j = 0; j < BC; j++) a[i][j] ^= rk[i][j]; } Optimized software implementation 11 Mathematical description of the round transformations (1) SubBytes bi,j = S[ai,j] ShiftRows c0,j c1,j c2,j c3,j = b0,j b1,j+C1 b2,j+C2 b3,j+C3 sums mod BC=Nb Mathematical description of the round transformations (2) MixColumns AddRoundKey e0,j e1,j e2,j e3,j = d0,j d1,j d2,j d3,j = 02 03 01 01 01 02 03 01 01 01 02 03 03 01 01 02 c0,j c1,j c2,j c3,j d0,j d1,j d2,j d3,j k0,j k1,j k2,j k3,j Mathematical description of the entire round e0,j e1,j e2,j e3,j = 02 03 01 01 01 02 03 01 01 01 02 03 03 01 01 02 S[a0,j] S[a1,j+C1] S[a2,j+C2] S[a3,j+C3] k0,j k1,j k2,j k3,j e0,j e1,j e2,j e3,j = S[a0,j] 02 01 01 03 S[a1,j+C1] 03 02 01 01 S[a2,j+C2] 01 03 02 01 S[a3,j+C3] 01 01 03 02 k0,j k1,j k2,j k3,j 12 Fast implementation of the entire round (1) e0,j e1,j e2,j e3,j = S[a0,j] 02 01 01 03 S[a1,j+C1] 03 02 01 01 S[a2,j+C2] 01 03 02 01 S[a3,j+C3] 01 01 03 02 k0,j k1,j k2,j k3,j e0,j e1,j e2,j e3,j = T0[a0,j] T1[a1,j+C1] T2[a2,j+C2] T3[a3,j+C3] k0,j k1,j k2,j k3,j Definition of T-tables static const u32 Te0[256] = { 0xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU, 0xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U, 0x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU, 0xe7fefe19U, 0xb5d7d762U, 0x4dababe6U, 0xec76769aU, 0x8fcaca45U, 0x1f82829dU, 0x89c9c940U, 0xfa7d7d87U, 0xeffafa15U, 0xb25959ebU, 0x8e4747c9U, 0xfbf0f00bU, . . . . . . . . . . . . . Look-up Tables T 15 Fast implementation of the entire round (1) e0,j e1,j e2,j e3,j = S[a0,j] 02 01 01 03 S[a1,j+C1] 03 02 01 01 S[a2,j+C2] 01 03 02 01 S[a3,j+C3] 01 01 03 02 k0,j k1,j k2,j k3,j e0,j e1,j e2,j e3,j = T0[a0,j] T1[a1,j+C1] T2[a2,j+C2] T3[a3,j+C3] k0,j k1,j k2,j k3,j e0,j e1,j e2,j e3,j = T0[a0,j] T1[a1,j+C1] T2[a2,j+C2] T3[a3,j+C3] k0,j k1,j k2,j k3,j Fast implementation of the entire round (2) 4 x 256 x 4 B = 4 kB for tables T0, T1, T2, T3 ej = T0[a0,j] RotByte( T0[a1,j+C1] (RotByte ( T0[a2,j+C2] RotByte (T0[a3,j+C3]))) Kj 256 x 4 B = 1 kB of memory table T0 Memory: Time: 20 LUTs + 16 XORs Memory: Time: 20 LUTs + 16 XORs + 12 ROTs Optimized Decryption 16 AES in Hardware key scheduling encryption/decryption memory of internal keys data output data input input interface output interface Control unit control Top level block diagram key Primary parameters of hardware implementations of secret-key block ciphers Latency Throughput Encryption/ decryption Time to encrypt/decrypt a single block of data Mi Ci Number of bits encrypted/decrypted in a unit of time Encryption/ decryption Mi Mi+1 Mi+2 Ci Ci+1 Ci+2 Throughput = Block_size · Number_of_blocks_processed_simultaneously Latency 17 Key agility Packet-switched networks EK1(packet1) EK2(packet2) EK3(packet3) time Key scheduling for K1 Key scheduling for K2 Decrypt packet 1 Decrypt packet 2 = Time of key scheduling Overhead % Time of single block decryption 1 # blocks/packet Hardware Architectures of Block Ciphers Initial transformation Final transformation #rounds times Round Key[i] i:=i+1 Round Key[0] i:=1 i<#rounds? Cipher Round Round Key[#rounds+1] Typical Internal Structure of a Secret-Key Block Cipher 20 Non-feedback Counter Mode - CTR M0 M1 M2 E Ci = Mi  AES(IV+i) for i=0..N MN-1 MN . . . E E E E . . . C1 C2 C3 CN-1 CN IV IV+1 IV+2 IV+N-1 IV+N round #rounds =k pipeline stages . . . . round 1 = k pipeline stages round 2 =k pipeline stages . . . . . . . . . . . . d) k registers round K = k pipeline stages . . . . round 1 = k pipeline stages round 2 = k pipeline stages MUX . . . . . . . . . . . . c) k registers one round = k pipeline stages MUX . . . . b) k registers one round, no pipelining MUX a) register combinational logic Architectures Suitable for Non-Feedback Cipher Modes one round register1 register2 register k . . . . pipeline stage 1 pipeline stage 2 pipeline stage k multiplexer Inner-Round Pipelining 21 IN OUT P1 C1 P2 C3 P3 Inner-Round Pipelining: Timing #rounds · (k · reduced_clock_period) CLK P4 P5 P6 C2 C4 k=2 Full Mixed Inner- and Outer-round Pipelining round #rounds =k pipeline stages . . . . round 1 = k pipeline stages round 2 =k pipeline stages . . . . . . . . . . . . k registers Total # of pipeline stages = #rounds·k - basic architecture - outer-round pipelining - inner-round pipelining - mixed inner and outer-round pipelining Area Throughput basic architecture inner-round pipelining mixed inner and outer-round pipelining outer-round pipelining K=2 K=3 K=4 K=2 K=3 k=2 kopt Throughput vs. Area for Architectures with Pipelining 22 Implementation of Basic Operations of AES S-box and Inversion in GF(28) S-box 8 x 8 ROM Hardware direct logic 8 8 28 words 8-bit address 8-bit output ... x1 x2 x8 ... y1 y2 y8 S 28  8 bits Permutation order of wires Permutation P Hardware n n x1 x2 x3 xnxn-1 . . . y1 y2 y3 ynyn-1 . . . 25 T-box Based Architecture Ti[ai,j] T tables Encryption XOR Network 8 8 8 8 8 88 . . . .32 32 32 32 32 32 32 128 32 32 32 32 ai,j ej j=0..3 j=0..3, i=0..3 j=0..3, i=0..3 Kj j=0..3 Multiplication Elimination 8 8 8 8 8 88 128 128 128 128 S[ai,j] j=0..3, i=0..3 . . . . Ti[ai,j] j=0..3, i=0..3 Encryption feedback output Encryption input Encryption final output ShiftRow 128 Encryption Round . . . . round key round key . . . . 32 32 32 32 128 Multiplication Elimination Since MixColumns operation is not performed in the last round of encryption, the last round needs to be treated in a special way. In this round, S-boxes need to be used instead of T-boxes. 26 T-1 tables 8 8 8 8 8 88 . . . .32 32 32 32 32 32 32 32 128 ai,j j=0..3, i=0..3 Inverse Multiplication Elimination 8 8 8 8 8 88 S-1[ai,j] j=0..3, i=0..3 . . . . Ti -1[ai,j] j=0..3, i=0..3 Decryption feedback output Decryption input 128 128 128 Decryption final output InvShiftRow 128 Decryption Round Decryption XOR Network 32 32 32 32 dj j=0..3 128 . . . . . . . . Ti -1[ai,j] j=0..3, i=0..3 Kj j=0..3 round key 32 32 32 32 128 round key Inverse Multiplication Elimination Since InvMixColumns operation is not performed in the last round of decryption, the last round needs to be treated in a special way. Encryption round Decryption round Data input Data output round key Encryption input Decryption input Encryption feedback output Encryption final output Decryption final output Decryption feedback output round key round key 27 AES Specific Optimizations Sharing Look-up Tables between SubBytes and InvSubBytes Inversion in GF(28) affine transformation inversed affine transformation ShiftRow MixColumn subkey InvShiftRow subkey InvMixColumn encryption decryption Substitution-Linear Transformation Network: Rijndael in Hardware - units shared between encryption and decryption encryption block decryption block 128 128128 128128 128 input ByteSub 128 128 128 output Hardware implementation - top level diagram 128
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved