Graphics Reference
In-Depth Information
// Compute contribution from third row.
41
lLoad = vload16 (0, in +( offset + width ￿ 2+0));
42
mLoad = vload16 (0, in +( offset + width ￿ 2+1));
43
rLoad = vload16 (0, in +( offset + width ￿ 2+2));
44
45
46
lData = convert short16 ( lLoad );
mData = convert short16 ( mLoad );
47
rData = convert short16 ( rLoad );
48
49
50
_dx += rData
lData ;
_dy
= rData + lData + mData ￿ ( short16 )2;
51
52
53
// Store the results .
vstore16 ( convert char16 ( _dx >> 3) , 0, dx + offset + width +1);
54
vstore16 ( convert char16 ( _dy >> 3) , 0, dy + offset + width +1);
55
Listing 7.3. Computing contribution from the third row: char16 .
7.4.6 Reusing Loaded Data
Larger load operations. The char8 kernel performed eight char8 load operations.
The char8_load16 kernel, partially shown in Listing 7.4, performs only three
char16 load operations: the required subcomponents are extracted by swizzle
operations, which are often free on the Midgard architecture. Table 7.1 confirms
that the number of memory operations per pixel is decreased, while still allowing
the kernel to be launched with up to 128 simultaneous work-items per core.
Eliminating redundant loads. The char16 kernel performed three char16 load op-
erations to read 18 bytes for the first and third rows. The char16_swizzle ker-
nel, partially shown in Listing 7.5, performs two char16 load operations for the
leftmost and rightmost vectors and reconstructs the middle vector by swizzle
operations.
// Compute contribution from third row.
41
load = vload16 (0, in +( offset + width ￿ 2+0));
42
43
44
lData = convert short8 ( load . s01234567 );
mData = convert short8 ( load . s12345678 );
45
rData = convert short8 ( load . s23456789 );
46
47
48
_dx += rData
lData ;
_dy = rData + lData + mData
( short8 )2;
49
50
51
// Store the results .
vstore8 ( convert char8 ( _dx >> 3) , 0, dx + offset + width +1);
52
vstore8 ( convert char8 ( _dy >> 3) , 0, dy + offset + width +1);
53
Listing 7.4. Computing contribution from the third row: char8_load16 .
Search WWH ::




Custom Search