Mathematical Preliminaries (Image Processing) Part 2

The Wavelet Toolbox

There are many frequently used wavelet bases, and prototyping wavelet-based algorithms is easily accomplished in MATLAB using the Wavelet Toolbox. This toolbox includes a plethora of functions for dealing with wavelet transforms. Since this topic’s focus is on image processing, we restrict our discussion to those capabilities pertaining to the 2D DWT and IDWT. However, the Wavelet Toolbox contains numerous other functions for performing one-dimensional wavelet transforms, as well as functions for analyzing signals using the continuous wavelet transform. This section briefly introduces this toolbox, and describes some of the functions one may find useful in image processing. More detailed information can be found in [6],

The dwt2 function can be used to perform a single-level wavelet decomposition. There are two usage patterns of this function, both of which expect the image matrix to be the first argument. One form of dwt2 is generic in that the user provides the low-pass and high-pass filter coefficients to the function. The other form allows the user to specify certain canned wavelet decompositions, which are selected by treating the second argument as a string specifying the wavelet basis to be used for the wavelet analysis. Thus

tmp17F-125_thumb[2]


where I is a variable containing the pixel data, performs the Haar DWT and returns the approximation coefficients in LL and the three detail coefficient matrices depicted in Figure 6-6 in LH, HL, and HH. Two other common bases are the "Daubechies-4" (D4) and "Daubechies-6" (D6) wavelets4. The D6 wavelet is implemented using six-tap filters, and implementing the 2D D6 DWT using hand-crafted functions similar to Listing 6-2 involves constructing a periodized H matrix that looks like the following:

tmp17F-126_thumb[2]

where {h0, hi, h2, hi, h4, hs} are the analysis low-pass filter coefficients for the D6 wavelet4. If the Wavelet Toolbox is available, dwt2 can be used instead by providing the appropriate string to indicate the desired wavelet. Upon encountering a string indicating which wavelet family to use, dwt2 actually invokes another Wavelet Toolbox function, wfilters, which returns the actual filter coefficients associated with that particular wavelet family. The following code illustrates both methods of performing the D4 and D6 wavelet decomposition on an image I:

tmp17F-127_thumb[2]

The output of dwt2 can be combined into a single matrix via MATLAB’s built-in matrix concatenation operations. For example,

tmp17F-128_thumb[2]

It is oftentimes difficult to visualize the detail coefficient matrices due to the scaling of the coefficients – the wcodemat function is useful for displaying pseudo-color representations of wavelet coefficient matrices. The inverse 2D DWT toolbox function is idwt2 and reconstitutes the original image from the four subbands. For example,

tmp17F-129_thumb[2]

In addition, there is another form of idwt2 that expects the user to pass in the actual synthesis low-pass and high-pass filters for reconstruction of the image. Performing a multi-level wavelet decomposition and corresponding reconstruction is more involved than the single-level case. While it is certainly possible to put dwt2 and idwt2 in a loop, the Wavelet Toolbox provides a few very convenient functions that should be used instead. A multi-level pyramid wavelet analysis can be performed in one fell swoop using wavedec2, an example of which is shown below for a three-level decomposition of an image I:

tmp17F-130_thumb[2]

S is a bookkeeping matrix that maintains size information about each level’s components. Extraction of approximation and detail coefficients from a multi-level decomposition is accomplished via appcoef2 and detcoef2. Extraction of the approximation coefficients via appcoef2 is straightforward; one passes in the coefficient and bookkeeping matrices returned from wavedec2, the level, and either a wavelet family string or the actual synthesis low-pass and high-pass filter coefficients:

tmp17F-131_thumb[2]

Extracting the detail coefficients can be done en masse (for a single level), or each of the LH, HH, and HL subbands may be extracted separately. Extraction of each subband is specified in detcoef2 with a string argument, where ‘h’ refers to the horizontal edge subband (LH), ‘v’ refers to the vertical edge subband (HL), and’d’ refers to the diagonal edge subband (HH). Some examples follow:

tmp17F-132_thumb[2]

Use wrcoef2 to reconstruct either an entire image or certain subbands from a multi-level decomposition. For example, to reconstruct the original image from a three-level 2D D6 DWT, the equivalent IDWT is achieved using this command:

tmp17F-133_thumb[2]

A form similar to detcoef2 is used in wrcoef2 to synthesize a particular subband. The first argument is a string, with the acceptable values being the same as detcoef2, except in wrcoef2 ‘a’ synthesizes the level-n approximation. For example,

tmp17F-134_thumb[2]

Finally, an extremely powerful and sophisticated graphical user interface for the Wavelet Toolbox is available. To access this GUI, type wavemenu at the MATLAB prompt, and for more information refer to [6] or the online help.

Other Wavelet Software Libraries

A variety of free Wavelet software packages and libraries exist for MATLAB. One of the more fully featured of these is the WaveLab package that can be downloaded from http:://www-stat.stanford.edu. The DSP group at Rice University has implemented the Rice Wavelet Toolbox (RWT), portions of which accompany [2] and can be downloaded from http://www-dsp.rice.edu/software/rwt.shtml. Finally, "Amara’s Wavelet Page", accessed at http://www.amara.com/current/wavelet.html is a fabulous one-stop shop for wavelet information and software in particular.

The Intel IPP Library includes several functions for implementing the 2D DWT on Pentium-based systems. As is the case with the rest of the library, the functions are heavily optimized and chances are that a faster implementation targeting the Pentium processor does not exist. Of course, the source code is not available, but nonetheless as we have demonstrated in previous applications having an algorithm working on the desktop in a C/C++ environment has its own benefits. For more details, refer to [7].

Implementing the 2D DWT on the C6416 DSK with IMGLIB

The C6x IMGLIB wavelet functions are a perfect example of tailoring the mathematics of an algorithm to a form commensurate with the computing environment. Recall that we implemented the 2D DWT in 6.1.1 by first performing a series of row DWTs, followed by a series of column DWTs on the transformed data. We expressed this computation in MATLAB rather elegantly by simply transposing the circular convolution and permutation matrices, effectively moving the filtering operation to operate down the columns of the input image. Unfortunately, expressive elegance does not always lend itself to computational efficiency, as was evident in 4.5.4.3 when we saw how inefficient qsort was when used to implement the median filter. Elegance should always take a back seat to swift processing in deeply embedded software.

Performing the 2D DWT on the C6x DSP using IMGLIB is done via the IMG_wave_horz and IMG_wave_vert functions. TPs implementation of the 2D DWT is unique in that the transformation of the image columns is performed in-place, allowing us to avoid an expensive matrix transposition operation, which impacts both storage requirements and computation cycles. Moreover, the functions are constructed in such a manner that making it fairly straightforward to DMA in blocks of pixels during the horizontal (across the rows) and in particular the vertical (down the columns) phases of the DWT. In this fashion we are able to use the 2D DWT in a memory-constrained system, even though the mechanics of the transform diverge somewhat from the standard theoretical formulation. To that end, in this section we develop the forward 2D DWT via IMGLIB on the C6416 DSK in a series of stages, similar to what we did in 4.3 and 4.4 where we optimized linear filtering of images on a step-by-step basis. Our roadmap to the 2D DWT follows:

1. A single-level 2D DWT program that pages blocks of image data in and out of internal memory as needed using memcpy (2d_dwt_single_level. c).

2. Same as (1), except we augment the processing to provide a multi-level 2D DWT (2d_dwt_multi_level. c).

3. A multi-level 2D DWT program that performs some parallel paging of data via DMA operations (2d_dwt_multi_level_dma. c).

The project can be found in the Chap6\IMGLIB_2d_dwt directory, where there are three C source files pertaining to each of the aforementioned programs. To build each project, add the appropriate source file to the project in CCStudio, remembering to remove any of the other C source files from the project to avoid duplication of symbols (failure to do so will result in a linker error). The 256×256 "Lenna" image is included within the image. h file, and due to the in-place nature of the implementation the input and output buffer are one and the same (wcoef s).

The IMGLIB functions limit the length of the low-pass and high-pass wavelet filters to 8 coefficients, which is sufficient for most applications. For those filters defined by less than 8 coefficients, the double-word aligned array containing the coefficients is padded with zeros. These 2D DWT programs all use the D4 wavelet, and the analysis filter coefficients are stored in Q15 format in d4_qmf_Q15 and d4_mqmf_Q15.

Taking advantage of faster on-chip RAM by paging data into buffers strategically placed in on-chip memory is always a priority in data-intensive DSP applications (which is essentially the same as saying it is a priority in all DSP applications), but even more so with the IMGLIB wavelet functions. The IMGLIB wavelet functions are fed 16-bit signed integers, and output the transformed data as 16-bit signed integers. A Q15 format dictates 16-bit output, but the fact that the input must also be 16 bits may seem wasteful at first glance – might it be more efficient for IMG_wave_horz to accept 8 bpp image data? The answer is yes, but only for the case of a single-level wavelet decomposition. To implement a multi-level pyramidal wavelet decomposition, we will continue by repeatedly calling the IMGLIB functions, each time passing them the upper-left quadrant of the previous level’s output – the LL subband. Hence, it makes sense to stipulate that all input data be 16 bits. Additionally, the DWT is directly tied to convolution, and we have seen how the convolution operation requires repeated accesses to the data as one slides across the input signal. As a result, it becomes an absolute necessity to construct the implementation so that it is relatively easy to integrate paging of data with the actual processing.

Verification of these programs can be done by spooling the contents of the wcoefs image buffer to disk using the CCStudio File I/O facilities explained in 3.2.2. The MATLAB read_ccs_data_file, which fully supports signed 8 and 16 bit data, can then be used to import the transformed data into MATLAB. The data can then be visualized using either imshow or if the Image Processing Toolbox is not available, image sc. Note that the buffer length should be set to 32768 when saving the 256×256 16-bit image matrix from within CCStudio, as a 256×256 coefficient matrix contains 64K elements and each line in the CCStudio hex formatted output file consists of a single word, for a total of (256)(256)/2 = 32K elements.

Single-Level 2D DWT

Listing 6-4 shows the contents of 2d_dwt_single_level.c. A single wavelet decomposition is performed in two steps, transformation of the rows in trans form_rows followed by transformation of that output in the orthogonal direction within transform cols. Passing each row of the input image through the low-pass and high-pass filters constituting the Quadrature Mirror Filter is fairly straightforward. In transform_rows, we page in 8 rows at a time of image data into an on-chip memory buffer wvlt_in_buf with a call to the standard C library function memcpy. We then call IMG_wave_horz on a row-by-row basis, feeding the output to another on-chip memory buffer wvlt_out_buf. These 8 rows of processed horizontal data are then paged back out to a temporary storage buffer horzcoefs, which serves as input data for the next phase of the 2D DWT.

Listing 6-4: 2d dwt single level, c.

Listing 6-4: 2d dwt single level, c.

 

 

 

 

Listing 6-4: 2d dwt single level, c.

 

 

 

 

 

 

Listing 6-4: 2d dwt single level, c.

A simplistic implementation of the 2D DWT could then proceed to follow the example of the MATLAB implementations in Listing 6-2 and 6-3, i.e. just transpose horzcoefs and reuse IMG_wave_horz. This would work, but a far cheaper method is achieved though the use of IMG_wave_vert. This function is repeatedly invoked in transform_cols, and it is here where this implementation of the 2D DWT breaks from the traditional model. The usage of IMG_wave_vert is thoroughly explained in the IMGLIB documentation8 – the reason for its somewhat non-intuitive nature is to avoid a matrix transposition. The mechanics of the column-wise transformation is perhaps best garnered through close inspection of the code, but Figure 6-9 illustrates in schematic form how the columns of the horzcoefs matrix are processed. A total of 8 rows of horzcoefs are passed into IMG_wave_vert via an array of pointers (pwvbuf s), and the vertical filter is then traversed across the width of the entire image, thereby yielding a row of low-pass filtered data and a row of high-pass filtered data. This constitutes the final output for this level of the 2D DWT and both of these rows are then paged back out to the wcoefs buffer. The next set of horizontal wavelet coefficients are paged in from external RAM in fetch_horz_wavelet_scanlines. With every invocation, six of the eight pointers are shifted up by two slots, and the next two rows are then fetched via memcpy. After transform_cols completes its march through the horzcoefs matrix, a single level of the wavelet decomposition has been completed and resides within wcoefs.

Note that we could replace memcpy in this and the following program with the DSP_blk_move DSPLIB function in order to gain a modest performance increase. The buffers and lengths used in this application meet the requirements (word-alignment and lengths a multiple of 2), but we choose not to do so because in 6.1.4.3 we use DMA as a much more effective optimization technique.

Next post:

Previous post: