Digital Signal Processing Reference
In-Depth Information
And finally the fourth cycle computes a complete column and the values are latched in shadow
registers:
SR 0 ¼ R 1 þC 03 ¼
C 01 þC 02 þC 03
SR 1 ¼ R 2 þC 03 ¼ C 00 þ 2 C 01 þ 3 C 02 þC 03
SR 2 ¼ R 3 þ
2
C 00 þ
3
:
C 03
SR 3 ¼ R 0 þ 2 C 03 ¼ 3 C 00 þC 01 þC 02 þ 2 C 03
3
C 03 ¼ C 00 þC 01 þ
2
C 02 þ
3
For in-place computation, these values from the shadow registers are moved into the locations of
byte-addressable memory that are used to calculate these values. This requires generating index
values in a particular pattern. The logic that generates addressing patterns for in-place computation
is explained in the next section. The MC module multiplies all the columns and saves the values in
the data memory.
The first row of registers are also reset every fourth cycle for computing the multiplication of the
next column. An enable signal latches the values of the final result in the second row of shadow
registers. These values are then saved in the data memory by in-place addressing. As the same HW
block is time-shared for performing the next round of the AES algorithm, the SR requires reading the
fifteenth value in the fourth cycle for the MC computation in the next round. The value is still in the
register of MC in this clock cycle. The value is directly passed for ARK computation. The timing
diagram of Figure 13.34 illustrates the generation of different signals.
It is important to appreciate that in many applications no established technique can be applied to
design an effective architecture. The 8-bit AES architecture described in this section is a good
example that demonstrates this assertion. The algorithm is explored by expending all the computa-
tions in an iteration. Then, intelligently, all dependencies are resolved by tracing out single-byte
operations to make the algorithm work on a single byte of input.
13.3.3.4 Byte-Systolic Fully Parallel Architecture
This section presents a novel byte-systolic fully parallel AES architecture [38]. The architecture
works on byte in-place indexing. A byte of plain text is input to the architecture and a byte of cipher
text is output in every clock cycle after an initial latency of 16 15 cycles. All the rounds of
encryption are implemented by cascading all the stages with pipeline logic. The data is input to the
first stage in byte-serial fashion.When the 16 bytes have beenwritten in the first data RAMblock, the
stage starts executing the first round of the algorithm. At the same time the input data for the second
frame is written in the RAM block by employing byte in-place addressing. This scheme writes the
input data at locations that are already used in the current cycle of the design. For example, the first
stage reads the RAM in row-shifted order reading indices 0, 5, 10 and 15 in the first four cycles. The
four bytes of input data for the second frame are written at these locations in the first RAM block.
The four tables in first row of Figure 13.35 show the write addresses for the first four frames in the
first RAMblock for data. These address patterns repeat after every four frames. Thememory locations
are given column-wise. The memory location is numbered in the corner of each box, while the indices
of the values written in these locations are in the center. The four tables in the second row show the
sequential ordering of these indices for reading in row-shifted order. The column-wise numbers show
the memory locations that are read in a sequence to input the data in the specified format.
Figure 13.36 shows the systolic architecture. Each stage has its own RAM blocks for storing data
for a round and corresponding key. The addressing for reading from the RAM block and writing the
data in the same location is performed using the address generation unit shown with each memory
block. The addressing is done using an index. For four successive frames the value of the index is
Search WWH ::




Custom Search