Graphics Reference
In-Depth Information
which transformations should be considered when optimizing kernel code for the
Mali-T600 GPUs. Therefore, the described behavior may differ from the actual
behavior for expository purposes.
We perform our experiments on an Arndale development board 2 powered by
the Samsung Exynos 5250 chip. Exynos 5250 comprises a dual-core Cortex-A15
CPU at 1.7 GHz and a quad-core Mali-T604 GPU at 533 MHz. The OpenCL
driver is of version 3.0 Beta.
7.2 Overview of the OpenCL Programming Model
In the OpenCL programming model, the host (e.g., a CPU) manages one or more
devices (e.g., a GPU) through calls to the OpenCL API. A call to the clEnqueueND
Range() API function submits a job that executes on a selected device the same
program ( kernel ) 3 as a collection of work-items .
Each work-item has a unique index ( global ID ) in a multidimensional itera-
tion space ( ND-range ), specified as the global work size argument to clEnqueueND
Range() .The local work size argument to clEnqueueNDRange() determines how the
ND-range is partitioned into uniformly sized work-groups . Work-items within the
same work-group can synchronize using the barrier() built-in function, which
must be executed by all work-items in the work-group before any work-item con-
tinues execution past the barrier.
In our examples, the ND-range is two-dimensional: work-items iterate over
the pixels of the output image (Section 7.4) or the elements of the output matrix
(Section 7.5), while work-groups iterate over partitions (or tiles )ofthesame.For
example, the local work size of (4 , 16) would result in 64 work-items per work-
group, with the first work-group having global IDs ( x, y ), where 0
x< 4and
y< 16.
0
7.3 ARM Mali-T600 GPU Series
The ARM Mali-T600 GPU series based on the Midgard architecture is designed
to meet the growing needs of graphics and compute applications for a wide
range of consumer electronics, from phones and tablets to TVs and beyond. The
Mali-T604 GPU, the first implementation of the Mali-T600 series, was the first
mobile and embedded class GPU to pass the OpenCL v1.1 Full Profile confor-
mance tests.
2 http://www.arndaleboard.org
3 The kernel is typically written in the OpenCL C language, which is a superset of a subset
of the C99 language standard.
Search WWH ::




Custom Search