Related Work - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

(a)

(b)

(c) (d)

Fig. 3.2. Image compression using pruned pyramids: (a) original image of a letter; (b) reso-

lution used after pruning (darker shading corresponds to higher resolution, the compression

ratio is 150:1); (c) reconstructed address region; (d) difference of the reconstruction to the

original (amplified for visibility).

only at the corresponding positions to verify and to refine the hypotheses. This saves

computational costs, compared to a high-resolution search.

Burt and Adelson [38] proposed the use of differences L 0 ,L 1 ,...,L k− 1 be-

tween the levels of a Gaussian pyramid as low-entropy representation for image

compression. The set of L i 's is called a Laplacian pyramid. The L i are computed as

pixel-wise differences between G i and its estimate

e

G i = expand ( G i +1 ) , obtained

by supersampling G i +1 to the higher resolution and interpolating the missing values.

Fig. 3.1(b) shows the Laplacian pyramid for the example. It decomposes the image

into a sequence of spatial frequency bands. Perfect reconstruction of G 0 is possible

when G k and L 0 ,L 1 ,...,L k− 1 are given by using the recursion G i =

e

G i + L i .

Since for natural images the values of L i are mostly close to zero, they can be

compressed using quantization. The reconstruction proceeds in a top-down fashion.

Thus, progressive transmission of images is possible with this scheme.

Since the pyramid has a tree structure, it can be pruned to reduce its size. This

method works well if the significant image details are confined within small regions.

Figure 3.2 shows an image of a letter with size 2,048 × 1,412. Most of the area can

be represented safely by using only the lower resolution levels, while the higher

resolutions concentrate at the edges of the print. Although pruning compresses the

image by a ratio of 150:1, the address is still clearly readable.

Another application of image pyramids is hierarchical block matching, proposed

by Bierling [31] for motion estimation in video sequences. Since the higher levels of

the pyramid are increasingly invariant to translations, image motion is estimated in

the coarsest resolution first. The estimated displacement vectors are used as starting

Search WWH ::

Custom Search

Home