Graphics Reference
In-Depth Information
7
IV
Optimizing OpenCL Kernels for
the ARM ￿ Mali ￿ -T600 GPUs
Johan Gronqvist and Anton Lokhmotov
7.1 Introduction
OpenCL is a relatively young industry-backed standard API that aims to provide
functional portability across systems equipped with computational accelerators
such as GPUs: a standard-conforming OpenCL program can be executed on any
standard-conforming OpenCL implementation.
OpenCL, however, does not address the issue of performance portability :trans-
forming an OpenCL program to achieve higher performance on one device may
actually lead to lower performance on another device, since performance may de-
pend significantly on low-level details, such as iteration space mapping and data
layout [Howes et al. 10,Ryoo et al. 08].
Due to the popularity of certain GPU architectures, some optimizations have
become hallmarks of GPU computing, e.g., coalescing global memory accesses
or using local memory. Emerging mobile and embedded OpenCL-capable GPUs,
however, have rather different organization. Therefore, even seasoned GPU devel-
opers may need to forgo their instincts and learn new techniques when optimizing
for battery-powered GPU brethren.
In this chapter, we introduce the ARM Mali-T600 GPU series (Section 7.3)
and discuss performance characteristics of several versions of the Sobel edge de-
tection filter (Section 7.4) and the general matrix multiplication (Section 7.5). 1
We make no claim that the presented versions are the fastest possible imple-
mentations of the selected algorithms. Rather, we aim to provide an insight into
ARM is a registered trademark of ARM Limited (or its subsidiaries) in the EU and/or
elsewhere. Mali is a trademark of ARM Limited (or its subsidiaries) in the EU and/or elsewhere.
All rights reserved.
1 Source code for some versions is available in the Mali OpenCL SDK [ARM 13].
 
 
Search WWH ::




Custom Search