Cryptography Reference
In-Depth Information
is constructed by composing two transformations, the multiplicative inverse in the
finite field
2 8
). The S-Box can be
seen as a substitution table where for each eight-bit input there is a unique eight-bit
output value.
The ShiftRows operation changes the byte position in the state. It rotates each
row by different offsets to obtain a new state. In particular, the first row is unchanged;
the second row is rotated one byte position to the left, the third row is rotated two
positions, and the fourth row three positions. ShiftRows is a linear transformation.
The MixColumns operates column-wise altering all the bytes of the same column,
and combining the four bytes in each column. It treats a column as a third-degree
polynomial with coefficients in
F (
)
followed by an affine transformation (over
F (
2
)
2 8
F (
)
and produces the new column by multiplying
it with a constant matrix.
The round function is parameterized using a key expansion function that generates
a variation of the original cipher key at each round repetition. The AddRoundKey
adds this corresponding round key to the current State using bit-wise XOR. Not
relevant for the self-consistency of the chapter, details on the key expansion procedure
can be found in [142].
The decryption process straightforwardly follows the encryption process by using
at each round the corresponding inverse functions, i.e. InvSubBytes, InvMixColumns
and InvShiftRows. Functions are executed in the opposite order as that of round keys.
6.2.2 AES Hardware Implementations
Since the standardization of the AES in 2001, several papers have been published
proposing different hardware implementations, to cope with different requirements
and goals such as throughput, speed, target (i.e., FPGA or ASIC), power consumption,
area, and resistance to side-channel attacks. This section presents the main classes
of solutions, from the architectural point of view to the optimization of each block.
At a high level, three architectures are possible: iterated, loop-unrolled, and
pipelined. In the iterated architecture the circuit's data path implements the basic
functions of one round, and the data is iterated several times in order to obtain the
final result. This architecture leads to the smallest (and the slowest) implementa-
tion. Loop-unrolled architectures implement two (or more) rounds per clock cycle,
and the execution of the algorithm is iterated. They achieve the highest speed at
the cost of greater area. Finally, pipelined architectures increase the throughput of
encryption/decryption by processing multiple blocks of data simultaneously. They
are realized by inserting some registers so that each pipeline stage can be one round
function (or sub-function), the whole round, or more than one round (in the case of
a loop-unrolled architecture).
Since area is one of the main concerns when dealing with embedded systems,
such as smart cards, instead of implementing a data path that contains the whole
round functions (i.e., with 16 instances of S-Boxes and four MixColumns as in [267,
357]) several works propose reducing the width of the data path in iterated-based
Search WWH ::




Custom Search