Digital Design of Communication Systems - Digital Design of Signal Processing Systems: A Practical Approach

Digital Signal Processing Reference

In-Depth Information

h0

h1

h2

h3

h4

h5

h6

h7

s0

s1

s2

s0 s1 s2

Figure 13.19 Parallel comparisons of characters listed in the first row with the three characters in the

search buffer

The compression algorithm is sequential, so an iteration depends on the result of the previous

iteration. It might appear that a single iteration of the algorithm can be parallelized, but multiple

iterations are difficult to parallelize. A technique can be used where the architecture computes all

possible results for the subsequent iterations and then finally makes the selection. This technique is

effective to design high-speed architectures for applications that have dependencies across itera-

tions. An implementation of the technique to design a massively parallel architecture to perform

LZ77 compression at a multi-gigabit rate is presented in [28-30]. A representative architecture is

given in this section.

Here the example assumes a history buffer h of size 8 and maximum match of 3 characters that

requires a search buffer s of size 3. The architecture performs all the matches in parallel. These

matches are shown in the grid of Figure 13.19. The first row shows the values of the history and

search buffers that are compared in parallel with the three characters in the search buffer.

The architecture computes the longest match of the search buffer [s0 s1 s2] to the string of

characters listed in the first row. When it finds a match it computes the length of the match and also

identifies its location. The size of history and the search buffers offer a tradeoff between area and

achievable compression ratio, because the larger the buffer the larger the area.

Figure 13.20(a) computes the best match for one iteration of the algorithm. Each row computes a

comparison listed in a row of the grid of Figure 13.19. Each comparator compares the two characters

and generates 1 if there is a match. A cascaded AND operation in each row carries forward the 1 for

successive matches. The output of the AND operation is also used in each column to select the row

that results in a successful match. The serial logic is a column is shown in Figure 13.21(a). The logic

ORs the results of all rows in each column, and the last row of multiplexers selects the length of the

maximum match and the multiplexer pointer_mux selects the pointer.

Beside serial realization of the column logic, the logic can be optimized for logarithmic time. The

optimized logic groups the serial logic to work in pairs and is depicted in Figure 13.21(b). This

results in that identifies the location of the match logarithmic rather than serial complexity.

Architecture can be designed that computes multiple iterations of the algorithm. The architecture

is optimized as computations are shared across multiple iterations. Figure 13.22 displays a table

showing all the possible comparisons required for performing multiple iterations of the algorithm,

highlighting the sharing of comparisons in the first three columns.

The first column of the table shows the first iteration. The second column computes the result if no

match is established in the first iteration, and s0 is moved from the search buffer to the history buffer

with coding (0 0 s0). Similarly the third, fourth and fifth columns show if the first iteration

results in match length of 1, 2 and 3, respectively. If the first iteration matches two characters, than

Digital Design of Signal Processing Systems: A Practical Approach

Search WWH ::

Custom Search

Home