Hardware Reference
In-Depth Information
FIGURE 4.14 Simplified block diagram of a Multithreaded SIMD Processor . It has 16
SIMD lanes. The SIMD Thread Scheduler has, say, 48 independent threads of SIMD instruc-
tions that it schedules with a table of 48 PCs.
The GPU hardware then contains a collection of multithreaded SIMD Processors that ex-
ecute a Grid of Thread Blocks (bodies of vectorized loop); that is, a GPU is a multiprocessor
composed of multithreaded SIMD Processors.
The first four implementations of the Fermi architecture have 7, 11, 14, or 15 multithreaded
SIMD Processors; future versions may have just 2 or 4. To provide transparent scalability
across models of GPUs with differing number of multithreaded SIMD Processors, the Thread
Block Scheduler assigns Thread Blocks (bodies of a vectorized loop) to multithreaded SIMD
Processors. Figure 4.15 shows the floor plan of the GTX 480 implementation of the Fermi ar-
chitecture.
 
Search WWH ::




Custom Search