Information Technology Reference
In-Depth Information
Template
t=0.00s
t=1.12s
t=5.22s
t=5.50s
Fig. 7. 3D object tracking based on 2 planar regions tracking. Two adjacent tracking regions are
described in red and blue boxes separately. The color boxes in first raw show the tracked regions
in current images. The warped images are shown in the second row. SIFT tracking results are
shown in the third row. The tracked region by SIFT is also shown with color boxes. Occlusion
happens at t = 5 . 22 s and disappears at t = 5 . 50 s .
homography solution. The tracked regions at t
5 . 50 s show that our
system can still track the 3D region by using SIFT results even when occlusion happens.
=
5 . 22 s and t
=
5
Optimization
Though CUDA uses C language with several extensions which makes it easier than
other GPU languages, to make GPU code highly proficient, carefully optimization must
be exploited and several important factors must be considered. In this section, we de-
scribe our optimization experience in our GPU applications.
5.1
Memory Hierarchy
CUDA provides a hierarchy of memory resources including on-chip memory (register,
shared memory) and off-chip memory (local memory, global memory const and tex-
ture memory) . In our GPU applications, we intensively utilize the fast on-chip shared
memory instead of the long-latency global memory. For example, in kernel “Jesm”, the
computation of J esm matrix needs several intermediate results based on the image gra-
dient. So we first load the image gradient data into shared memory and then continue
other computation from shared memory. By using this “cache” like strategy, we have
greatly reduced the kernel's running time.
Search WWH ::




Custom Search