Wavelet-Based Edge Detection (Image Processing) Part 4

Standalone Multiscale Edge Detector (C6416DSK)

At this point, C6416 DSK users might be feeling a bit forlorn, as they have been left out in the cold in our pursuit of an embedded multiscale edge detector. In this section, we rectify the situation by presenting an optimized implementation of the wavelet-based edge detector that takes advantage of some of the unique features of the C6416. Our optimizations are two-fold: high-level modifications involving a change in the overall algorithm and general structure of the implementation, and lower-level optimizations that employ C intrinsics. The project files are located on the CD-ROM in the Chap6\wave_edge\C6416DSK directory.

One facet of the implementations in the previous two sections that begs for attention is that they are needlessly memory consumptive and actually perform superfluous image subtractions. For starters, there is no reason why in Listing 6-7, the detail image needs to be calculated for each wavelet decomposition. This particular segmentation algorithm calls for only segmenting the final detail image, and we since we never perform the inverse simplified a trous wavelet transform (see Figure 6-10 and the reconstruction formula), we need only calculate the last detail image. Thus, we can do away with the three-dimensional detail matrix as defined in Listing 6-8. Of course, if we happened to be using a more sophisticated segmentation scheme that perhaps looked at wavelet coefficients across different scales, we could no longer take this approach. Yet with this change in hand, it is clear that storing each and every approximation coefficient matrix is also hugely wasteful as well, for even if we did in fact need to compute the inverse transform, we only require storage for the current and previous approximation images (where the initial approximation image is the actual input image). The updated wave_edge. h header file is given in Listing 6-11 and reflects these changes. Note that in addition to the drastically reduced memory footprint, we also go back to storing the image buffers as flattened arrays. This aspect has a few advantages: it becomes easier to page in blocks of image data from external RAM to fast on-chip RAM, and the data layout is such that it is more suitable for packed data processing using specialized C64x instructions via C intrinsics.


Listing 6-11: The updated C6416 DSK version of wave_edge.h, featuring flattened arrays and substantially less storage than that defined in Listing 6-7.

Listing 6-11: The updated C6416 DSK version of wave_edge.h, featuring flattened arrays and substantially less storage than that defined in Listing 6-7.

In the new wave_decomp_a_trous function, we alternate between using approxl and approx2 as input into the 2D B3 low-pass filter, with the corresponding other buffer used as output for the filter. Then at the end of the outer-most loop (per each wavelet decomposition level) we simply feed those two arrays into calc_detail. That function then creates the final detail coefficient image by calculating the absolute value of the delta between the final two approximation images, returning the mean of that image which is then used during segmentation. The three functions that constitute the meat of this program are given in Listing 6-12.

Listing 6-12: Optimized versions of the wave_decomp_a_trous, calc_detail, and segment_detail_image functions from the C6416 DSK version of wave_edge . c.

Listing 6-12: Optimized versions of the wave_decomp_a_trous, calc_detail, and segment_detail_image functions from the C6416 DSK version of wave_edge . c.

 

 

 

 

Listing 6-12: Optimized versions of the wave_decomp_a_trous, calc_detail, and segment_detail_image functions from the C6416 DSK version of wave_edge . c.

 

 

 

 

 

Listing 6-12: Optimized versions of the wave_decomp_a_trous, calc_detail, and segment_detail_image functions from the C6416 DSK version of wave_edge . c.

 

 

 

 

 

Listing 6-12: Optimized versions of the wave_decomp_a_trous, calc_detail, and segment_detail_image functions from the C6416 DSK version of wave_edge . c.

The first two of the modified functions, calc_detail and wave_decomp_a_trous, make extensive use of various compiler intrinsics. In segment_detail, we optimize the performance of the 3×3 averaging filter by replacing array subscripting with pointer arithmetic. We made this tradeoff between readability and performance extensively in next topic, but Kernighan and Ritchie said it best in [23] that "any operation that can be done by array subscripting can also be done with pointers. The pointer version will be faster but, at least to the uninitiated, somewhat harder to understand." Hopefully, if the reader has gotten this far into the topic, he or she no longer falls under the ranks of the uninitiated! This optimization could (and should) also be applied in the 5×5 convolution loop in wave_decomp_a_trous. The number of operations in the fdtering double-loop in segment_detail could also be further reduced by employing the technique used in Listing 5-10 (calc hist) and particularly Listing 4-19 (collect_local_pixel_stats), by "shifting out" the left-most 1×3 column via subtraction and only adding the next 1×3 column for every iteration across the column dimension of the image (the innermost loop). This optimization flows out of the fact that we are using a box filter with constant coefficients to compute the localized average – with the 2D B3 spline filter this condition does not hold and hence we cannot apply the same optimization in wave_decomp_a_trous.

Next post:

Previous post: