Graphics Reference
In-Depth Information
7.3.5 A Note on Power
We discuss performance in terms of the time it takes to complete a computa-
tion. When analyzing the performance of an OpenCL kernel running on a mobile
device, it is also important to consider the power and energy required for the
execution. Often, the mobile device's thermal dissipation capacity will determine
the maximum DVFS operating point (voltage and frequency) at which the GPU
can be run. On Mali-T600 GPUs, it is often su cient to characterize the perfor-
mance of the kernel in terms of the number of cycles required for the execution.
To determine the overall performance, one also has to factor in the GPU clock
rate.
We focus on the cycle count of kernel execution and consider this sucient for
our optimization purposes. We posit that GPU power is (to a broad approxima-
tion) constant across sustained GPGPU workloads at a fixed operating point. 9
Energy consumed for a given workload is therefore determined by performance—
both the number of cycles for that workload and the operating point required to
meet the required performance target.
7.4 Optimizing the Sobel Image Filter
7.4.1 Algorithm
×
3 image filter used within edge-detection algo-
rithms. Technically speaking, the Sobel filter is a (2 K +1) × (2 K +1) convolution
of an input image I with a constant mask C :
Our first example is the Sobel 3
K
K
O y,x =
I y + u,x + v ·
C u,v ,
u =
K
v =
K
taking an image containing the luminosity values and producing two images con-
taining the discretized gradient values along the horizontal and vertical directions:
101
1
1
,
O dx
C dx
u,v , where C dx =
y,x =
I y + u,x + v ·
202
101
u =
1
v =
1
121
000
1
1
O dy
C dy
u,v , where C dy =
.
y,x =
I y + u,x + v ·
1
2
1
u =
1
v =
1
9 This assumption holds broadly on the Mali-T600 GPU series, as a result of the pipeline
architecture, aggressive clock gating, and the GPU's ability to hide memory latency effectively.
Search WWH ::




Custom Search