Hardware Reference
In-Depth Information
Vector-Length Registers: Handling Loops Not Equal To 64
A vector register processor has a natural vector length determined by the number of elements
in each vector register. This length, which is 64 for VMIPS, is unlikely to match the real vector
length in a program. Moreover, in a real program the length of a particular vector operation
is often unknown at compile time. In fact, a single piece of code may require different vector
lengths. For example, consider this code:
for (i=0; i <n; i=i+1)
Y[i] = a * X[i] + Y[i];
The size of all the vector operations depends on n , which may not even be known until run
time! The value of n might also be a parameter to a procedure containing the above loop and
therefore subject to change during execution.
The solution to these problems is to create a vector-length register (VLR). The VLR controls
the length of any vector operation, including a vector load or store. The value in the VLR,
however, cannot be greater than the length of the vector registers. This solves our problem
as long as the real length is less than or equal to the maximum vector length (MVL). The MVL
determines the number of data elements in a vector of an architecture. This parameter means
the length of vector registers can grow in later computer generations without changing the
instruction set; as we shall see in the next section, multimedia SIMD extensions have no equi-
valent of MVL, so they change the instruction set every time they increase their vector length.
What if the value of n is not known at compile time and thus may be greater than the MVL?
To tackle the second problem where the vector is longer than the maximum length, a tech-
nique called strip mining is used. Strip mining is the generation of code such that each vector
operation is done for a size less than or equal to the MVL. We create one loop that handles any
number of iterations that is a multiple of the MVL and another loop that handles any remain-
ing iterations and must be less than the MVL. In practice, compilers usually create a single
strip-mined loop that is parameterized to handle both portions by changing the length. We
show the strip-mined version of the DAXPY loop in C:
low = 0;
VL = (n % MVL); /*find odd-size piece using modulo op % */
for (j = 0; j <= (n/MVL); j=j+1) { /*outer loop*/
for (i = low; i < (low+VL); i=i+1) /*runs for length VL*/
Y[i] = a * X[i] + Y[i] ; /*main operation*/
low = low + VL; /*start of next vector*/
VL = MVL; /*reset the length to maximum vector length*/
}
The term n/MVL represents truncating integer division. The effect of this loop is to block the
vector into segments that are then processed by the inner loop. The length of the first segment
is (n % MVL) , and all subsequent segments are of length MVL . Figure 4.6 shows how to split the
long vector into segments.
Search WWH ::




Custom Search