Plane Sweeping in Eye-Gaze Corrected, Teleimmersive 3D Videoconferencing - Advances in Embedded Computer Vision

Graphics Reference

In-Depth Information

Fig. 6.15 Redistributed depth planes

6.5 Implementation and Optimizations

The use of carefully selected and adapted algorithms allows us to exploit the GPU for

general-purpose computations, a technique that is often referred to as general-purpose

GPU computing. Our framework harnesses the powerful computational resources of

the graphics hardware, and maximizes the arithmetic intensity of the algorithm to

ensure real-time performance.

The algorithm execution is further accelerated by elevating the processing granu-

larity from pixels to tiles, configured in a set of well-defined granularity parameters.

The processing is hereby only performed on the vertices (i.e., corner points) of the

tiles, and therefore approximates—by inherent linear interpolation—the result of

pixels inside the tile.

6.5.1 Improved Camera Data Transfer

Experimental profiling shows that downloading RGB-colored input images to the

graphics card causes a threefold increase in the data transfer time due to the memory

bandwidth bottleneck of PCI express, i.e., the bus connection between the mother-

board northbridge controller and the GPU. This severely reduces the frame rate to

two-third of its maximum capacity.

This bottleneck is effectively tackled by transferringBayer-pattern images directly

to the video memory. By inserting an additional demosaicing processing step, more

computations are introduced, but only one-third of input image data has to be sent,

effectively increasing the performance over 30%. The reason is that graphics hard-

ware benefit high arithmetic intensity kernels, as they process computations signifi-

cantly faster than transferring data.

Search WWH ::

Custom Search

Home