Hardware Reference
In-Depth Information
most difficult instances of complex trade-offs occurs in a register-memory architecture in
deciding how many times a variable should be referenced before it is cheaper to load it into
a register. This threshold is hard to compute and, in fact, may vary among models of the
same architecture.
Provide instructions that bind the quantities known at compile time as constants —A compiler
writer hates the thought of the processor interpreting at runtime a value that was known
at compile time. Good counterexamples of this principle include instructions that interpret
values that were fixed at compile time. For instance, the VAX procedure call instruction
( calls ) dynamically interprets a mask saying what registers to save on a call, but the mask
is fixed at compile time (see Section A.10 ) .
Compiler Support (Or Lack Thereof) For Multimedia
Instructions
Alas, the designers of the SIMD instructions (see Section 4.3 in Chapter 4 ) basically ignored the
previous subsection. These instructions tend to be solutions, not primitives; they are short of
registers; and the data types do not match existing programming languages. Architects hoped
to find an inexpensive solution that would help some users, but often only a few low-level
graphics library routines use them.
The SIMD instructions are really an abbreviated version of an elegant architecture style that
has its own compiler technology. As explained in Section 4.2 , vector architectures operate on
vectors of data. Invented originally for scientific codes, multimedia kernels are often vector-
izable as well, albeit often with shorter vectors. As Section 4.3 suggests, we can think of In-
tel's MMX and SSE or PowerPC's AltiVec as simply short vector computers: MMX with vec-
tors of eight 8-bit elements, four 16-bit elements, or two 32-bit elements, and AltiVec with vec-
tors twice that length. They are implemented as simply adjacent, narrow elements in wide re-
gisters.
These microprocessor architectures build the vector register size into the architecture: the
sum of the sizes of the elements is limited to 64 bits for MMX and 128 bits for AltiVec. When
Intel decided to expand to 128-bit vectors, it added a whole new set of instructions, called
Streaming SIMD Extension (SSE).
A major advantage of vector computers is hiding latency of memory access by loading many
elements at once and then overlapping execution with data transfer. The goal of vector ad-
dressing modes is to collect data scatered about memory, place them in a compact form so
that they can be operated on efficiently, and then place the results back where they belong.
Vector computers include strided addressing and gather/scater addressing (see Section 4.2 ) to
increase the number of programs that can be vectorized. Strided addressing skips a fixed num-
ber of words between each access, so sequential addressing is often called unit stride address-
ing . Gather and scater ind their addresses in another vector register: Think of it as register
indirect addressing for vector computers. From a vector perspective, in contrast, these short-
vector SIMD computers support only unit strided accesses: Memory accesses load or store all
elements at once from a single wide memory location. Since the data for multimedia applic-
ations are often streams that start and end in memory, strided and gather/scatter addressing
modes are essential to successful vectorization (see Section 4.7 ) .
Search WWH ::




Custom Search