Data-Level Parallelism in Vector, SIMD, and GPU Architectures - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Jim Smith

International Symposium on Computer Architecture (1994)

Vector architectures grab sets of data elements scatered about memory, place them into

large, sequential register files, operate on data in those register files, and then disperse the

results back into memory. A single instruction operates on vectors of data, which results in

dozens of register-register operations on independent data elements.

These large register files act as compiler-controlled buffers, both to hide memory latency

and to leverage memory bandwidth. Since vector loads and stores are deeply pipelined, the

program pays the long memory latency only once per vector load or store versus once per

element, thus amortizing the latency over, say, 64 elements. Indeed, vector programs strive to

keep memory busy.

VMIPS

We begin with a vector processor consisting of the primary components that Figure 4.2

shows. This processor, which is loosely based on the Cray-1, is the foundation for discussion

throughout this section. We will call this instruction set architecture VMIPS ; its scalar portion

is MIPS, and its vector portion is the logical vector extension of MIPS. The rest of this subsec-

tion examines how the basic architecture of VMIPS relates to other processors.

Search WWH ::

Custom Search

Home