Vector Quantization - Introduction to Data Compression

Databases Reference

In-Depth Information

V j can then be defined as

V j

={

x

:

d

(

x

,

y j )<

d

(

x

,

y i ) ∀

i

=

j

}

(4)

Thus, the quantizer is completely defined by the output points and a distortion measure.

From a multidimensional point of view, using a scalar quantizer for each input restricts

the output points to a rectangular grid. Observing several source output values at once allows

us to move the output points around. Another way of looking at this is that in one dimension

the quantization intervals are restricted to be intervals, and the only parameter that we can

manipulate is the size of these intervals. When we divide the input into vectors of some length

n , the quantization regions are no longer restricted to be rectangles or squares. We have the

freedom to divide the range of the inputs in an infinite number of ways.

These examples have shown two ways in which the vector quantizer can be used to improve

performance. In the first case, we exploited the sample-to-sample dependence of the input. In

the second case, there was no sample-to-sample dependence; the samples were independent.

However, looking at two samples together still improved performance.

These two examples can be used to motivate two somewhat different approaches toward

vector quantization. One approach is a pattern-matching approach, similar to the process used

in Example 10.3.1 , while the other approach deals with the quantization of random inputs. We

will look at both of these approaches in this chapter.

10.4 The Linde-Buzo-Gray Algorithm

In Example 10.3.1 we saw that one way of exploiting the structure in the source output is to

place the quantizer output points where the source output points (blocked into vectors) are most

likely to congregate. The set of quantizer output points is called the codebook of the quantizer,

and the process of placing these output points is often referred to as codebook design . When

we group the source output in two-dimensional vectors, as in the case of Example 10.3.1 ,

we might be able to obtain a good codebook design by plotting a representative set of source

output points and then visually locating where the quantizer output points should be. However,

this approach to codebook design breaks down when we design higher-dimensional vector

quantizers. Consider designing the codebook for a 16-dimensional quantizer. Obviously, a

visual placement approach will not work in this case. We need an automatic procedure for

locating where the source outputs are clustered.

This is a familiar problem in the field of pattern recognition. It is no surprise, therefore,

that the most popular approach to designing vector quantizers is a clustering procedure known

as the k -means algorithm, which was developed for pattern recognition applications.

The k -means algorithm functions as follows: Given a large set of output vectors from the

source, known as the training set , and an initial set of k representative patterns, assign each

element of the training set to the closest representative pattern. After an element is assigned,

the representative pattern is updated by computing the centroid of the training set vectors

assigned to it. When the assignment process is complete, we will have k groups of vectors

clustered around each of the output points.

Introduction to Data Compression

Search WWH ::

Custom Search

Home