Dictionary Techniques - Introduction to Data Compression

Databases Reference

In-Depth Information

T A B L E 5 . 2

Thirty most frequently occurring

pairs of characters in a

41,364-character-long LaTeX

document.

Pair

Count

Pair

Count

eb

1128

ar

314

bt

838

at

313

bb

823

b w

309

th

817

te

296

he

712

bs

295

in

512

db

272

sb

494

bo

266

er

433

io

257

ba

425

co

256

tb

401

re

247

en

392

b $

246

on

385

rb

239

nb

353

di

230

ti

322

ic

229

bi

317

ct

226

Suppose we wish to encode the sequence

abracadabra

The encoder reads the first two characters ab and checks to see if this pair of letters exists in

the dictionary. It does and is encoded using the codeword 101. The encoder then reads the

next two characters ra and checks to see if this pair occurs in the dictionary. It does not, so the

encoder sends out the code for r , which is 100, then reads in one more character, c , to make the

two-character pattern ac . This does exist in the dictionary and is encoded as 110. Continuing

in this fashion, the remainder of the sequence is coded. The output string for the given input

sequence is 101100110111101100000.

A list of the 30 most frequently occurring pairs of characters in an earlier version of this

chapter is shown in Table 5.2 . For comparison, the 30 most frequently occurring pairs of

characters in a set of C programs is shown in Table 5.3 .

In these tables,

b corresponds to a space and nl corresponds to a new line. Notice how

different the two tables are. It is easy to see that a dictionary designed for compressing LaTeX

documents would not work very well when compressing C programs. However, we generally

want techniques that will be able to compress a variety of source outputs. If we want to

compress computer files, we do not want to change techniques based on the content of the file.

Rather, we would like the technique to adapt to the characteristics of the source output. We

discuss adaptive-dictionary-based techniques in the next section.

/

Introduction to Data Compression

Search WWH ::

Custom Search

Home