Databases Reference
In-Depth Information
Context-Based Compression
6.1 Overview
In this chapter, we present a number of techniques that use minimal prior as-
sumptions about the statistics of the data. Instead they use the context of the
data being encoded and the past history of the data to provide more efficient
compression. We will look at a number of schemes that are principally used for
the compression of text. These schemes use the context in which the data occurs
in different ways.
6.2 Introduction
In Chapters 3 and 4, we learned that we get more compression when the message that is being
coded has a more skewed set of probabilities. By “skewed” we mean that certain symbols
occur with much higher probability than others in the sequence to be encoded. So it makes
sense to look for ways to represent the message that would result in greater skew. One very
effective way to do so is to look at the probability of occurrence of a letter in the context in
which it occurs. That is, we do not look at each symbol in a sequence as if it had just happened
out of the blue. Instead, we examine the history of the sequence before determining the likely
probabilities of different values that the symbol can take.
In the case of English text, Shannon [ 4 ] showed the role of context in two very interesting
experiments. In the first, a portion of text was selected and a subject (possibly his wife, Mary
Shannon) was asked to guess each letter. If she guessed correctly, she was told that she was
 
 
Search WWH ::




Custom Search