Graphics Reference
In-Depth Information
which transformations should be considered when optimizing kernel code for the
Mali-T600 GPUs. Therefore, the described behavior may differ from the actual
behavior for expository purposes.
We perform our experiments on an Arndale development board
2
powered by
the Samsung Exynos 5250 chip. Exynos 5250 comprises a dual-core Cortex-A15
CPU at 1.7 GHz and a quad-core Mali-T604 GPU at 533 MHz. The OpenCL
driver is of version 3.0 Beta.
7.2 Overview of the OpenCL Programming Model
In the OpenCL programming model, the host (e.g., a CPU) manages one or more
devices (e.g., a GPU) through calls to the OpenCL API. A call to the
clEnqueueND
Range()
API function submits a job that executes on a selected device the same
program (
kernel
)
3
as a collection of
work-items
.
Each work-item has a unique index (
global ID
) in a multidimensional itera-
tion space (
ND-range
), specified as the
global work size
argument to
clEnqueueND
Range()
.The
local work size
argument to
clEnqueueNDRange()
determines how the
ND-range is partitioned into uniformly sized
work-groups
. Work-items within the
same work-group can synchronize using the
barrier()
built-in function, which
must be executed by all work-items in the work-group before any work-item con-
tinues execution past the barrier.
In our examples, the ND-range is two-dimensional: work-items iterate over
the pixels of the output image (Section 7.4) or the elements of the output matrix
(Section 7.5), while work-groups iterate over partitions (or
tiles
)ofthesame.For
example, the local work size of (4
,
16) would result in 64 work-items per work-
group, with the first work-group having global IDs (
x, y
), where 0
≤
x<
4and
≤
y<
16.
0
7.3 ARM Mali-T600 GPU Series
The ARM Mali-T600 GPU series based on the Midgard architecture is designed
to meet the growing needs of graphics and compute applications for a wide
range of consumer electronics, from phones and tablets to TVs and beyond. The
Mali-T604 GPU, the first implementation of the Mali-T600 series, was the first
mobile and embedded class GPU to pass the OpenCL v1.1 Full Profile confor-
mance tests.
3
The kernel is typically written in the OpenCL C language, which is a superset of a subset
of the C99 language standard.