Hardware Reference
In-Depth Information
With a vector instruction, the system can perform the operations on the vector data elements
in many ways, including operating on many elements simultaneously. This flexibility lets vec-
tor designs use slow but wide execution units to achieve high performance at low power. Fur-
ther, the independence of elements within a vector instruction set allows scaling of functional
units without performing additional costly dependency checks, as superscalar processors re-
quire.
Vectors naturally accommodate varying data sizes. Hence, one view of a vector register size
is 64 64-bit data elements, but 128 32-bit elements, 256 16-bit elements, and even 512 8-bit ele-
ments are equally valid views. Such hardware multiplicity is why a vector architecture can be
useful for multimedia applications as well as scientific applications.
How Vector Processors Work: An Example
We can best understand a vector processor by looking at a vector loop for VMIPS. Let's take a
typical vector problem, which we use throughout this section:
X and Y are vectors, initially resident in memory, and a is a scalar. This problem is the so-called
SAXPY or DAXPY loop that forms the inner loop of the Linpack benchmark. (SAXPY stands
for s ingle-precision a × X p lus Y ; DAXPY for d ouble precision a × X p lus Y .) Linpack is a collec-
tion of linear algebra routines, and the Linpack benchmark consists of routines for performing
Gaussian elimination.
For now, let us assume that the number of elements, or length, of a vector register (64)
matches the length of the vector operation we are interested in. (This restriction will be lifted
shortly.)
Example
Show the code for MIPS and VMIPS for the DAXPY loop. Assume that the start-
ing addresses of X and Y are in Rx and Ry , respectively.
Answer
Here is the MIPS code.
L.D
F0,a
;load scalar a
DADDIU R4,Rx,#512
;last address to load
Loop: L.D
F2,0(Rx)
;load X[i]
MUL.D
F2,F2,F0
;a × X[i] [i]
L.D
F4,0(Ry)
;load Y[i]
ADD.D
F4,F4,F2
;a × X[i] + Y[i]
S.D
F4,9(Ry)
;store into Y[i]
DADDIU Rx,Rx,#8
;increment index to X
DADDIU Ry,Ry,#8
;increment index to Y
DSUBU
R20,R4,Rx
;compute bound
BNEZ
R20,Loop
;check if done
Search WWH ::




Custom Search