Graphics Reference
In-Depth Information
// Compute contribution from third row.
41
lLoad = vload16 (0, in +( offset + width ￿ 2+0));
42
rLoad = vload16 (0, in +( offset + width ￿ 2+2));
43
44
45
lData = convert short16 ( lLoad );
mData = convert short16 (
46
( uchar16 )( lLoad . s12345678 , rLoad . s789abcde ));
47
rData = convert short16 ( rLoad );
48
49
50
_dx += rData
lData ;
_dy
= rData + lData + mData ￿ ( short16 )2;
51
52
53
// Store the results .
vstore16 ( convert char16 ( _dx >> 3) , 0, dx + offset + width +1);
54
vstore16 ( convert char16 ( _dy >> 3) , 0, dy + offset + width +1);
55
Listing 7.5. Computing contribution from the third row: char16_swizzle .
While the number of instruction words increases in comparison to the char16
kernel (by nearly 50% for A-words and even slightly for LS-words due to in-
struction scheduling effects), this kernel can be launched with 128 simultaneous
work-items per core. This leads to the highest utilization of the A-pipes of all
the versions in this study, as well as the best overall performance.
7.4.7 Processing Multiple Rows
The kernels presented so far have loaded pixels from three input rows to compute
pixels in a single output row. In general, to compute pixels in n output rows,
n + 2 input rows are needed.
// Compute contribution from third row.
43
lLoad = vload8 (0, in +( offset + width ￿ 2+0));
44
mLoad = vload8 (0, in +( offset + width ￿ 2+1));
45
rLoad = vload8 (0, in +( offset + width ￿ 2+2));
46
47
48
lData = convert short8 ( lLoad );
mData = convert short8 ( mLoad );
49
rData = convert short8 ( rLoad );
50
51
52
_dx1 += rData
lData ;
_dy1
= rData + lData + mData ￿ ( short8 )2;
53
_dx2 += ( rData
lData ) ￿ ( short8 )2;
54
// Store the results .
68
vstore8 ( convert char8 ( _dx1 >> 3), 0, dx1 + offset + width +1);
69
vstore8 ( convert char8 ( _dy1 >> 3), 0, dy1 + offset + width +1);
70
vstore8 ( convert char8 ( _dx2 >> 3), 0, dx2 + offset + width ￿ 2+1);
71
vstore8 ( convert char8 ( _dy2 >> 3), 0, dy2 + offset + width ￿ 2+1);
72
Search WWH ::




Custom Search