Hardware Reference
In-Depth Information
FIGURE 4.13 The mapping of a Grid (vectorizable loop), Thread Blocks (SIMD basic
blocks), and threads of SIMD instructions to a vector-vector multiply, with each vector
being 8192 elements long . Each thread of SIMD instructions calculates 32 elements per in-
struction, and in this example each Thread Block contains 16 threads of SIMD instructions
and the Grid contains 16 Thread Blocks. The hardware Thread Block Scheduler assigns
Thread Blocks to multithreaded SIMD Processors and the hardware Thread Scheduler picks
which thread of SIMD instructions to run each clock cycle within a SIMD Processor. Only
 
Search WWH ::




Custom Search