Graphics Reference
In-Depth Information
Fig. 6.15 Redistributed depth planes
6.5 Implementation and Optimizations
The use of carefully selected and adapted algorithms allows us to exploit the GPU for
general-purpose computations, a technique that is often referred to as general-purpose
GPU computing. Our framework harnesses the powerful computational resources of
the graphics hardware, and maximizes the arithmetic intensity of the algorithm to
ensure real-time performance.
The algorithm execution is further accelerated by elevating the processing granu-
larity from pixels to tiles, configured in a set of well-defined granularity parameters.
The processing is hereby only performed on the vertices (i.e., corner points) of the
tiles, and therefore approximates—by inherent linear interpolation—the result of
pixels inside the tile.
6.5.1 Improved Camera Data Transfer
Experimental profiling shows that downloading RGB-colored input images to the
graphics card causes a threefold increase in the data transfer time due to the memory
bandwidth bottleneck of PCI express, i.e., the bus connection between the mother-
board northbridge controller and the GPU. This severely reduces the frame rate to
two-third of its maximum capacity.
This bottleneck is effectively tackled by transferringBayer-pattern images directly
to the video memory. By inserting an additional demosaicing processing step, more
computations are introduced, but only one-third of input image data has to be sent,
effectively increasing the performance over 30%. The reason is that graphics hard-
ware benefit high arithmetic intensity kernels, as they process computations signifi-
cantly faster than transferring data.
Search WWH ::




Custom Search