Hardware Reference
In-Depth Information
FIGURE 4.28 Roofline model [ Williams et al. 2009 ] . These rooflines show double-precision
floating-point performance in the top row and single-precision performance in the bottom row.
(The DP FP performance ceiling is also in the bottom row to give perspective.) The Core i7
920 on the left has a peak DP FP performance of 42.66 GFLOP/sec, a SP FP peak of 85.33
GFLOP/sec, and a peak memory bandwidth of 16.4 GBytes/sec. The NVIDIA GTX 280 has a
DP FP peak of 78 GFLOP/sec, SP FP peak of 624 GFLOP/sec, and 127 GBytes/sec of
memory bandwidth. The dashed vertical line on the left represents an arithmetic intensity of
0.5 FLOP/byte. It is limited by memory bandwidth to no more than 8 DP GFLOP/sec or 8 SP
GFLOP/sec on the Core i7. The dashed vertical line to the right has an arithmetic intensity of
4 FLOP/byte. It is limited only computationally to 42.66 DP GFLOP/sec and 64 SP GFLOP/
sec on the Core i7 and 78 DP GFLOP/sec and 512 DP GFLOP/sec on the GTX 280. To hit the
highest computation rate on the Core i7 you need to use all 4 cores and SSE instructions with
an equal number of multiplies and adds. For the GTX 280, you need to use fused multiply-add
 
Search WWH ::




Custom Search