Graphics Reference
In-Depth Information
if
((
x
<
DATA_W
)&&(
y
<
DATA_H
))
Filter_Response
[
Get2DIndex
(
x
,
y
,
DATA_W
)] =
Conv2D
(
s_Image
,
threadIdx
.
y
+
HALO
,
threadIdx
.
x
+
HALO
);
if
(((
x
+ 32)
<
DATA_W
)&&(
y
<
DATA_H
))
Filter_Response
[
Get2DIndex
(
x
+32,
y
,
DATA_W
)] =
Conv2D
(
s_Image
,
threadIdx
.
y
+
HALO
,
threadIdx
.
x
+32+
HALO
);
−
HALO
2))
{
if
(((
x
+ 64)
<
DATA_W
)&&(
y
<
DATA_H
))
Filter_Response
[
Get2DIndex
(
x
+64,
y
,
DATA_W
)] =
Conv2D
(
s_Image
,
threadIdx
.
y
+
HALO
,
threadIdx
.
x
+64+
HALO
);
if
(
threadIdx
.
x
<
(32
}
if
(
threadIdx
.
y
<
(32
−
HALO
2))
{
if
((
x
<
DATA_W
)&&((
y
+ 32)
<
DATA_H
))
Filter_Response
[
Get2DIndex
(
x
,
y
+32,
DATA_W
)] =
Conv2D
(
s_Image
,
threadIdx
.
y
+32+
HALO
,
threadIdx
.
x
+
HALO
);
}
if
(
threadIdx
.
y
<
(32
−
HALO
2))
{
if
(((
x
+ 32)
<
DATA_W
)&&((
y
+ 32)
<
DATA_H
))
Filter_Response
[
Get2DIndex
(
x
+32,
y
+32,
DATA_W
)] =
Conv2D
(
s_Image
,
threadIdx
.
y
+32+
HALO
,
threadIdx
.
x
+32+
HALO
);
}
if
((
threadIdx
.
x
<
(32
−
HALO
2)) &&
(
threadIdx
.
y
<
(32
−
HALO
2)) )
{
if
(((
x
+ 64)
<
DATA_W
)&&((
y
+ 32)
<
DATA_H
))
Filter_Response
[
Get2DIndex
(
x
+64,
y
+32,
DATA_W
)] =
Conv2D
(
s_Image
,
threadIdx
.
y
+32+
HALO
,
threadIdx
.
x
+64+
HALO
);
}
}
Listing 5.3.
Non-separable 2D convolution using shared memory. This listing represents
the second part of the kernel where the convolutions are performed, by calling a device
function for each block of filter responses. For a filter size of 17
×
17,
HALO
is 8 and the
first two blocks yield filter responses for 32
×
32 pixels, the third block yields 16
×
32
filter responses, the fourth and fifth blocks yield 32
16 filter responses and the sixth
block yields the last 16
×
16 filter responses. The parameter
HALO
can easily be changed
to optimize the code for different filter sizes. The code for the device function
Conv2D
is given in the repository.
×
5.6 Non-separable 3D Convolution
Three-dimensional convolution between a signal
s
and a filter
f
for position
[
x, y, z
] is defined as
f
x
=
N/
2
f
y
=
N/
2
f
z
=
N/
2
(
s
∗
f
)[
x, y, z
]=
s
[
x
−
f
x
,y
−
f
y
,z
−
f
z
]
·
f
[
f
x
,f
y
,f
z
]
.
f
x
=
−N/
2
f
y
=
−N/
2
f
z
=
−N/
2