Optimizing OpenCL Kernels for the ARM Mali-T600 GPUs - GPU Pro: Advanced Rendering Techniques - page 339

Graphics Reference

In-Depth Information

// Compute contribution from third row.

43

lLoad = vload8 (0, in +( offset + width 2+0));

44

mLoad = vload8 (0, in +( offset + width 2+1));

45

rLoad = vload8 (0, in +( offset + width 2+2));

46

47

48

lData = convert short8 ( lLoad );

mData = convert short8 ( mLoad );

49

rData = convert short8 ( rLoad );

50

51

52

_dx1 += rData −

lData ;

_dy1 −

= rData + lData + mData ( short8 )2;

53

_dx2 += ( rData −

lData ) ( short8 )2;

54

_dx3 = rData −

lData ;

55

_dy3 = rData + lData + mData

( short8 )2;

56

// Store the results .

83

vstore8 ( convert char8 ( _dx1 >> 3), 0, dx1 + offset + width +1);

84

vstore8 ( convert char8 ( _dy1 >> 3), 0, dy1 + offset + width +1);

85

vstore8 ( convert char8 ( _dx2 >> 3), 0, dx2 + offset + width 2+1);

86

vstore8 ( convert char8 ( _dy2 >> 3), 0, dy2 + offset + width 2+1);

87

vstore8 ( convert char8 ( _dx3 >> 3), 0, dx3 + offset + width 3+1);

88

vstore8 ( convert char8 ( _dy3 >> 3), 0, dy3 + offset + width 3+1);

89

Listing 7.8. Computing contribution from the third row: 3xchar8 .

7.5 Optimizing the General Matrix Multiplication

The Sobel filter implementations have hightlighted the importance of using vector

instructions and a high number of active work-items. We next study implementa-

tions of the general matrix multiplication (GEMM) to elucidate the importance

of using caches effectively. We first discuss aspects of the caches and how we op-

timize for them. At the end, we look at the runtimes on an Arndale development

board and compare to our discussions.

7.5.1 Algorithm

The general matrix multiplication is a function of the Basic Linear Algebra Sub-

programs (BLAS) API 11 that computes

C = αAB + βC,

where A , B , C are matrices of floating-point numbers and α , β are scalars.

7.5.2 Implementation Details

In our implementation, the matrices are N

N arrays of single-precision floating-

point numbers (SGEMM). We consider two common SGEMM variants:

×

11 http://www.netlib.org/blas

Next Page

GPU Pro: Advanced Rendering Techniques

Search WWH ::

Custom Search

Home