Mathematical Preliminaries for Lossy Coding - Introduction to Data Compression

Databases Reference

In-Depth Information

Probability models used for the design and analysis of lossy compression schemes differ

from those used in the design and analysis of lossless compression schemes. When developing

models in the lossless case, we tried for an exact match. The probability of each symbol was

estimated as part of themodeling process. Whenmodeling sources in order to design or analyze

lossy compression schemes, we look more to the general rather than exact correspondence.

The reasons are more pragmatic than theoretical. Certain probability distribution functions

are more analytically tractable than others, and we try to match the distribution of the source

with one of these “nice” distributions.

The uniform, Gaussian, Laplacian, and gamma distributions are four probability models

commonly used in the design and analysis of lossy compression systems:

Uniform Distribution: As for lossless compression, this is again our ignorance model.

If we do not know anything about the distribution of the source output, except possibly the

range of values, we can use the uniform distribution to model the source. The probability

density function for a random variable uniformly distributed between a and b is

⎧

⎨

1

a for a

x

b

f X (

x

) =

(83)

b

−

⎩

0

otherwise

Gaussian Distribution: The Gaussian distribution is one of the most commonly used

probability models for two reasons: it is mathematically tractable and, by virtue of the

central limit theorem, it can be argued that in the limit the distribution of interest goes

to a Gaussian distribution. The probability density function for a random variable with

a Gaussian distribution and mean

2 is

μ

and variance

σ

2

1

− (

x

− μ)

f X (

x

) =

√ 2

2 exp

(84)

2

σ

πσ

Laplacian Distribution: Many sources that we deal with have distributions that have a

sharp peak at zero. For example, speech consists mainly of silence. Therefore, samples of

speech will be zero or close to zero with high probability. Image pixels themselves do not

have any attraction to small values. However, there is a high degree of correlation among

pixels. Therefore, a large number of the pixel-to-pixel differences will have values close

to zero. In these situations, a Gaussian distribution is not a very close match to the data.

A closer match is the Laplacian distribution, which is peaked at zero. The distribution

function for a zero mean random variable with Laplacian distribution and variance

2 is

σ

2 exp − √ 2

1

√ 2

|

x

|

f X (

x

) =

(85)

σ

Gamma Distribution: A distribution that is even more peaked, though considerably

less tractable, than the Laplacian distribution is the gamma distribution. The distribution

function for a gamma-distributed random variable with zero mean and variance

2

σ

is

Introduction to Data Compression

Search WWH ::

Custom Search

Home