Digital Signal Processing Reference
In-Depth Information
h0
h1
h2
h3
h4
h5
h6
h7
s0
s1
s2
s0 s1 s2
s0 s1 s2
s0 s1 s2
s0 s1 s2
s0 s1 s2
s0 s1 s2
s0 s1 s2
s0 s1 s2
Figure 13.19 Parallel comparisons of characters listed in the first row with the three characters in the
search buffer
The compression algorithm is sequential, so an iteration depends on the result of the previous
iteration. It might appear that a single iteration of the algorithm can be parallelized, but multiple
iterations are difficult to parallelize. A technique can be used where the architecture computes all
possible results for the subsequent iterations and then finally makes the selection. This technique is
effective to design high-speed architectures for applications that have dependencies across itera-
tions. An implementation of the technique to design a massively parallel architecture to perform
LZ77 compression at a multi-gigabit rate is presented in [28-30]. A representative architecture is
given in this section.
Here the example assumes a history buffer h of size 8 and maximum match of 3 characters that
requires a search buffer s of size 3. The architecture performs all the matches in parallel. These
matches are shown in the grid of Figure 13.19. The first row shows the values of the history and
search buffers that are compared in parallel with the three characters in the search buffer.
The architecture computes the longest match of the search buffer [s0 s1 s2] to the string of
characters listed in the first row. When it finds a match it computes the length of the match and also
identifies its location. The size of history and the search buffers offer a tradeoff between area and
achievable compression ratio, because the larger the buffer the larger the area.
Figure 13.20(a) computes the best match for one iteration of the algorithm. Each row computes a
comparison listed in a row of the grid of Figure 13.19. Each comparator compares the two characters
and generates 1 if there is a match. A cascaded AND operation in each row carries forward the 1 for
successive matches. The output of the AND operation is also used in each column to select the row
that results in a successful match. The serial logic is a column is shown in Figure 13.21(a). The logic
ORs the results of all rows in each column, and the last row of multiplexers selects the length of the
maximum match and the multiplexer pointer_mux selects the pointer.
Beside serial realization of the column logic, the logic can be optimized for logarithmic time. The
optimized logic groups the serial logic to work in pairs and is depicted in Figure 13.21(b). This
results in that identifies the location of the match logarithmic rather than serial complexity.
Architecture can be designed that computes multiple iterations of the algorithm. The architecture
is optimized as computations are shared across multiple iterations. Figure 13.22 displays a table
showing all the possible comparisons required for performing multiple iterations of the algorithm,
highlighting the sharing of comparisons in the first three columns.
The first column of the table shows the first iteration. The second column computes the result if no
match is established in the first iteration, and s0 is moved from the search buffer to the history buffer
with coding (0 0 s0). Similarly the third, fourth and fifth columns show if the first iteration
results in match length of 1, 2 and 3, respectively. If the first iteration matches two characters, than
Search WWH ::




Custom Search