Optimizing OpenCL Kernels for the ARM Mali-T600 GPUs - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

Fixed local work size. Using our notation introduced previously, work-item ( j, i )

performs the following reads in loop iteration k :

A [ i,k ] ,B [4 k +0 ,j ] ,B [4 k +1 ,j ] ,B [4 k +2 ,j ] ,B [4 k +3 ,j ] ,

where each memory access now loads a float4 vector. With many active threads,

we will first see all threads performing their first reads from A , and thereafter

we will see all threads performing their first read from B ,etc.Thisimpliesthat

reads that are executed after each other correspond to different threads executing

the same instruction in the program code. With a local work size of (4,32), the

GPU initiates the work-items for work-group ( m, n ) by incrementing the first

index first, i.e., in the order

(4 m, 32 n ) , (4 m +1 , 32 n ) , (4 m +2 , 32 n ) , (4 m +3 , 32 n ) ,

(4 m, 32 n +1) , (4 m +1 , 32 n +1) , (4 m +2 , 32 n +1) , (4 m +3 , 32 n +1) ,

(4 m, 32 n +2) , (4 m +1 , 32 n +2) , (4 m +2 , 32 n +2) , (4 m +3 , 32 n +2) ,

...,

(4 m, 32 n + 31) , (4 m +1 , 32 n + 31) , (4 m +2 , 32 n + 31) , (4 m +3 , 32 n + 31) ,

where we have again used the comma as a sequencing operation to describe the

ordering of global_id values of the work-items.

This means that the memory reads for loop iteration k will execute in the

following order:

32 n +31

A [ i,k ] ,

i =32 n

j =4 m

4 m +3

32 n +31

B [4 k +0 ,j ] , 32 n +31

B [4 k +1 ,j ] ,

i =32 n

j =4 m

i =32 n

j =4 m

4 m +3

32 n +31

B [4 k +2 ,j ] , 32 n +31

B [4 k +3 ,j ] ,

i =32 n

j =4 m

i =32 n

j =4 m

4 m +3

where the ID variable j is incremented before i as it corresponds to get_global_id(0) ,

and it is therefore written as the innermost

operator.

We see that the reads from A do not depend on j and are therefore repeated

for each group of four consecutive work-items, and we introduce the

operation

to reflect repetition of the same memory access as in

j =4 m

4 m +3

A [ i,k ]= A [ i,k ]

GPU Pro: Advanced Rendering Techniques

Search WWH ::

Custom Search

Home