Context-Based Compression - Introduction to Data Compression

Databases Reference

In-Depth Information

Context-Based Compression

6.1 Overview

In this chapter, we present a number of techniques that use minimal prior as-

sumptions about the statistics of the data. Instead they use the context of the

data being encoded and the past history of the data to provide more efficient

compression. We will look at a number of schemes that are principally used for

the compression of text. These schemes use the context in which the data occurs

in different ways.

6.2 Introduction

In Chapters 3 and 4, we learned that we get more compression when the message that is being

coded has a more skewed set of probabilities. By “skewed” we mean that certain symbols

occur with much higher probability than others in the sequence to be encoded. So it makes

sense to look for ways to represent the message that would result in greater skew. One very

effective way to do so is to look at the probability of occurrence of a letter in the context in

which it occurs. That is, we do not look at each symbol in a sequence as if it had just happened

out of the blue. Instead, we examine the history of the sequence before determining the likely

probabilities of different values that the symbol can take.

In the case of English text, Shannon [ 4 ] showed the role of context in two very interesting

experiments. In the first, a portion of text was selected and a subject (possibly his wife, Mary

Shannon) was asked to guess each letter. If she guessed correctly, she was told that she was

Search WWH ::

Custom Search

Home