Hardware Reference
In-Depth Information
most difficult instances of complex trade-offs occurs in a register-memory architecture in
deciding how many times a variable should be referenced before it is cheaper to load it into
a register. This threshold is hard to compute and, in fact, may vary among models of the
same architecture.
■
Provide instructions that bind the quantities known at compile time as constants
—A compiler
writer hates the thought of the processor interpreting at runtime a value that was known
at compile time. Good counterexamples of this principle include instructions that interpret
values that were fixed at compile time. For instance, the VAX procedure call instruction
(
calls
) dynamically interprets a mask saying what registers to save on a call, but the mask
is fixed at compile time (see
Section A.10
)
.
Compiler Support (Or Lack Thereof) For Multimedia
Instructions
Alas, the designers of the SIMD instructions (see
Section 4.3
in
Chapter 4
) basically ignored the
previous subsection. These instructions tend to be solutions, not primitives; they are short of
registers; and the data types do not match existing programming languages. Architects hoped
to find an inexpensive solution that would help some users, but often only a few low-level
graphics library routines use them.
The SIMD instructions are really an abbreviated version of an elegant architecture style that
has its own compiler technology. As explained in
Section 4.2
,
vector architectures
operate on
vectors of data. Invented originally for scientific codes, multimedia kernels are often vector-
izable as well, albeit often with shorter vectors. As
Section 4.3
suggests, we can think of In-
tel's MMX and SSE or PowerPC's AltiVec as simply short vector computers: MMX with vec-
tors of eight 8-bit elements, four 16-bit elements, or two 32-bit elements, and AltiVec with vec-
tors twice that length. They are implemented as simply adjacent, narrow elements in wide re-
gisters.
These microprocessor architectures build the vector register size into the architecture: the
sum of the sizes of the elements is limited to 64 bits for MMX and 128 bits for AltiVec. When
Intel decided to expand to 128-bit vectors, it added a whole new set of instructions, called
Streaming SIMD Extension (SSE).
A major advantage of vector computers is hiding latency of memory access by loading many
elements at once and then overlapping execution with data transfer. The goal of vector ad-
dressing modes is to collect data scatered about memory, place them in a compact form so
that they can be operated on efficiently, and then place the results back where they belong.
Vector computers include
strided addressing
and
gather/scater addressing
(see
Section 4.2
) to
increase the number of programs that can be vectorized. Strided addressing skips a fixed num-
ber of words between each access, so sequential addressing is often called
unit stride address-
ing
. Gather and scater ind their addresses in another vector register: Think of it as register
indirect addressing for vector computers. From a vector perspective, in contrast, these short-
vector SIMD computers support only unit strided accesses: Memory accesses load or store all
elements at once from a single wide memory location. Since the data for multimedia applic-
ations are often streams that start and end in memory, strided and gather/scatter addressing
modes are essential to successful vectorization (see
Section 4.7
)
.
Search WWH ::
Custom Search