Hardware Reference
In-Depth Information
FIGURE 4.20 Block diagram of the multithreaded SIMD Processor of a Fermi GPU . Each
SIMD Lane has a pipelined floating-point unit, a pipelined integer unit, some logic for dispatch-
ing instructions and operands to these units, and a queue for holding results. The four Special
Function units (SFUs) calculate functions such as square roots, reciprocals, sines, and co-
sines.
Fermi introduces several innovations to bring GPUs much closer to mainstream system pro-
cessors than Tesla and previous generations of GPU architectures:
Fast Double-Precision Floating-Point Arithmetic —Fermi matches the relative double-precision
speed of conventional processors of roughly half the speed of single precision versus a
tenth the speed of single precision in the prior Tesla generation. That is, there is no order of
magnitude temptation to use single precision when the accuracy calls for double precision.
The peak double-precision performance grew from 78 GFLOP/sec in the predecessor GPU
to 515 GFLOP/sec when using multiply-add instructions.
 
Search WWH ::




Custom Search