Related Work - Hierarchical Neural Networks for Image Interpretation

Information Technology Reference

In-Depth Information

approximation is produced by a low-pass kernel L that is associated with the scaling

function φ , while the details are produced by a high-pass kernel H associated with

the wavelet ψ . Perfect reconstruction of the signal is possible by supersampling the

approximation and the details and convolving with reversed kernels.

For two-dimensional signals, such as images, the decomposition is applied con-

secutively to both dimensions, e.g. first to the rows and then to the columns. This

yields four types of lower-resolution coefficient images: the approximation pro-

duced by applying two low-pass filters ( LL ), the diagonal details, computed with

two high-pass kernels ( HH ), and the vertical and horizontal details, output of a

high-pass/low-pass combination ( LH and HL ). This is illustrated in Figure 3.3.

The low-resolution approximation of the signal can be decomposed recursively by

applying the same procedure. The resulting representation has the same size as the

input image with

4 of the coefficients describing the details of the finest resolution.

One of the major applications of wavelets is image compression and denoising.

It relies on the fact that most natural images are represented sparsely in wavelet co-

efficient space. Furthermore, additive zero mean i.i.d. Gaussian pixel noise spreads

uniformly over the coefficients. Thus, setting small coefficients to zero and keeping

only the few significant ones yields compression and suppression of noise. Donoho

and Johnstone [55] showed that such a wavelet shrinkage in an appropriate basis can

be a nearly optimal non-linear estimator for noise reduction.

Wavelet representations are also used for other computer vision tasks. For in-

stance, local maxima can be tracked through multiple resolutions to extract edges

robustly [152]. Since many functions can be used as wavelets, the choice of the basis

can be targeted to the application at hand. Coifman et al. [43] proposed to further

decompose not only the approximation side of the coefficients, but also the details.

This yields a nested sequence of wavelet packet decomposition trees that all form

an orthonormal basis of the signal if the wavelet itself is orthonormal.

Fourier Transformation. The size of a level in the wavelet-representation de-

creases exponentially with height. Thus, the representational power also decreases.

Higher levels of the wavelet decomposition represent only the coarse image struc-

ture, but it can be desirable to have a complete representation of the signal at each

level of the hierarchy. One way to hierarchically transform one complete represen-

tation into another is the fast Fourier transformation (FFT), introduced by Cooley

and Tukey [44].

A finite-energy signal f can be decomposed into a sum of sinusoids { e iωx } ω∈R :

f ( x ) =

−∞ f ( ω ) e iωx dx , where

+ ∞

f ( ω ) =

+ ∞

−∞ f ( x ) e −iωx dx is the Fourier

2 π

f ( ω ) describes, how much the sinusoidal

transformation of f . The amplitude of

wave e iωx contributes to the signal f .

For a discrete signal of length N = 2 j , it suffices to sample the frequency ω

N times to form an orthonormal basis. The discrete Fourier transformation (DFT)

is then: F ( k ) =

N− 1

n =0 e −i 2 πkn/N ( k = 0 ,...,N − 1) . It can be computed

efficiently by decomposing a N -point DFT into two DFTs of N/ 2 points that pro-

cess the even samples f e ( n ) = f (2 n ) and the odd samples f o ( n ) = f (2 n + 1)

separately:

√

Hierarchical Neural Networks for Image Interpretation

Search WWH ::

Custom Search

Home