Mathematical Preliminaries (Image Processing) Part 3

Multi-Level 2D DWT

A 2D DWT program that supports multiple wavelet decomposition levels is shown in Listing 6-5. The main complication arising with a multi-level 2D DWT has to do with the bookkeeping associated with the dyadic nature of the DWT. The transform_rows and transform_cols functions from Listing 6-4 have been modified to accept an argument indicating the current decomposition level and the number of columns in the LL subband. During the first level, the number of columns is simply Y SIZE, and is halved with each decomposition level. In MATLAB, it is a relatively simple matter to employ that language’s colon notation to splice out the LL subband at each decomposition level. In C, we of course do not have access to such facilities, and the situation is even more complicated due to the flattened matrix buffers.

IMG_wave_vert in action. Output scan-lines are shaded, and even though they are depicted in this figure in relation to the dimensions of the hor zcoef s coefficient matrix, they are actually copied into the 2D DWT output wcoef s. All variables refer to those in Listing 6-4, 6-5, and 6-6. (a) Initial iteration, (b) 3rd iteration, (c) 4th iteration.


Figure 6-9. IMG_wave_vert in action. Output scan-lines are shaded, and even though they are depicted in this figure in relation to the dimensions of the hor zcoef s coefficient matrix, they are actually copied into the 2D DWT output wcoef s. All variables refer to those in Listing 6-4, 6-5, and 6-6. (a) Initial iteration, (b) 3rd iteration, (c) 4th iteration.

As a result, in this program we place in separate functions the paging of data into internal memory (f etch data) and the paging of data out to external memory (page_out_contiguous_block). This modularity serves the purpose of aiding us when we eventually transition to using DMA to move data into and out of on-chip memory.

Listing 6-5: Portions of 2d dwt multi level, c.

Listing 6-5: Portions of 2d dwt multi level, c.

 

 

 

 

Listing 6-5: Portions of 2d dwt multi level, c.

 

 

 

 

Listing 6-5: Portions of 2d dwt multi level, c.

 

 

 

 

Listing 6-5: Portions of 2d dwt multi level, c.

The transform_rows function differs from its predecessor in Listing 6-4 in two respects: with each successive level it expands its operating cache of input data from wcoef s (since the total amount of data is a quarter of the previous level), and it uses fetch_data to grab these coefficients. Because of the dyadic nature of the decomposition, the LL subband is only a contiguous block in the initial level. Hence in f etch data we must march down each row in the LL subband, extracting nCols coefficients, instead of a simple block copy like that used in Listing 6-4. The same concept is also used from within trans form_cols during initialization of wvlt_in_buf and when grabbing the next two scan-lines worth of data in fetch_horz_wavelet_scanlines. In contrast, sending the 2D DWT coefficients produced from IMG_wave_vert back out to wcoefs is a contiguous block copy operation since we page out the data on a row-by-row basis, and for this operation transf orm_cols can invoke page_out_contiguous_block.

Multi-Level 2D DWT with DMA

Operationally, the program given in Listing 6-6 is identical to that of Listing 6-5. However, with this final IMGLIB-based 2D DWT program we replace the loop in fetch_data and the memcpy in page_out_contiguous_block with a CSL function we have not encountered before, DAT_copy2d9. This DMA utility function is ideal for copying subregions of matrices over a DMA channel. A second optimization is that we interleave processing and DMA operations in both the transform_rows and transform_cols functions.

Parameters used in DAT_copy2d to perform a DMA copy operation on a matrix of data. The TI documentation refers to the number of rows

Figure 6-10. Parameters used in DAT_copy2d to perform a DMA copy operation on a matrix of data. The TI documentation refers to the number of rows as lineCnt, the number of bytes per each row as lineLen, and the offset between successive rows as linePitch.

DAT_copy2d is a very flexible function allowing one to perform a variety of image copy operations by allowing clients of the function to specify the number of rows and columns in the source buffer as well as the number of bytes between one row and the next. The TI documentation refers to this final argument as the "line pitch", whereas both Intel and Microsoft refer to this quantity as "stride". Figure 6-10 depicts the meaning of the final three arguments of DAT_copy2d. The salient point is that DAT_copy2d removes the need to code a looping structure that kicks off multiple DMA operations, one of which would be needed for each row of input data since the data blocks are not necessarily contiguous.

Listing 6-6: Portions of 2d_dwt_multi_level_dma. c. Functions not shown are identical to those given in Listing 6-5.

Listing 6-6: Portions of 2d_dwt_multi_level_dma. c. Functions not shown are identical to those given in Listing 6-5.

 

 

 

 

Listing 6-6: Portions of 2d_dwt_multi_level_dma. c. Functions not shown are identical to those given in Listing 6-5.

In the DMA-optimized linear filtering program of 4.4.2 (see Listing 4-7), we exploited the asynchronous CSL DMA API to do useful processing work whilst the DMA/EDMA controller was off doing its thing – the same concept is used here. In trans form_rows, while we do need to block on the DMA read operation performed within f etch data, we do not need to block immediately on the DMA write operation that occurs within page_out_contiguous_block. Rather, we simply continue processing and when we are ready to page out to external memory this processed data, only then do we block on this DMA operation (chances are the operation will have long since concluded and thus the call to DAT_wait returns immediately). Likewise, within transform rows, the only serialized blocking is due to fetch_horz_wavelet_scanlines, because it in turn calls the blocking function fetch data. We do not wait for the output of IMG_wave_vert to be DMA’ed out to memory before proceeding on to low-pass and high-pass filter the next chunk of data.

In reality, the overall efficiency gain between this DMA-enabled version of the multi-level 2D DWT implementation and the non-DMA version are rather modest. IMG_wave_vert can be used in a more sophisticated fashion than used here, as described in [8], The documentation describes a scheme where the working buffer is 10 lines, instead of the 8 employed here. Eight lines are preloaded to bootstrap the processing, and then the next 2 lines are fetched in the background during the vertical filter operations. When these filters are done traversing the image, the pointers are moved up by two slots. Meanwhile, the next two lines are fetched via DMA and the actual image processing continues unabated during the fetch. Hence even more of the processing and block memory moves are performed in parallel, which of course will reduce the overall time needed to perform the 2D DWT.

Next post:

Previous post: