Hardware Reference
In-Depth Information
FIGURE 4.30 Raw and relative performance measured for the two platforms . In this
study, SAXPY is just used as a measure of memory bandwidth, so the right unit is GBytes/sec
and not GFLOP/sec. (Based on Table 3 in [Lee et al. 2010].)
Given that the raw performance specifications of the GTX 280 vary from 2.5× slower (clock
rate) to 7.5× faster (cores per chip) while the performance varies from 2.0× slower (Solv) to
15.2× faster (GJK), the Intel researchers explored the reasons for the differences:
Memory bandwidth . The GPU has 4.4× the memory bandwidth, which helps explain why
LBM and SAXPY run 5.0 and 5.3× faster; their working sets are hundreds of megabytes
and hence don't it into the Core i7 cache. (To access memory intensively, they did not use
cache blocking on SAXPY.) Hence, the slope of the rooflines explains their performance.
SpMV also has a large working set, but it only runs 1.9× because the double-precision loat-
ing point of the GTX 280 is only 1.5× faster than the Core i7. (Recall that the Fermi GTX 480
double-precision is 4× faster than the Tesla GTX 280.)
Compute bandwidth . Five of the remaining kernels are compute bound: SGEMM, Conv, FFT,
MC, and Bilat. The GTX is faster by 3.9, 2.8, 3.0, 1.8, and 5.7, respectively. The first three of
these use single-precision floating-point arithmetic, and GTX 280 single precision is 3 to 6×
faster. (The 9× faster than the Core i7 as shown in Figure 4.27 occurs only in the very spe-
cial case when the GTX 280 can issue a fused multiply-add and a multiply per clock cycle.)
MC uses double precision, which explains why it's only 1.8× faster since DP performance
is only 1.5× faster. Bilat uses transcendental functions, which the GTX 280 supports directly
(see Figure 4.17 ) . The Core i7 spends two-thirds of its time calculating transcendental func-
tions, so the GTX 280 is 5.7× faster. This observation helps point out the value of hardware
support for operations that occur in your workload: double-precision floating point and
perhaps even transcendentals.
 
Search WWH ::




Custom Search