Instruction Set Principles - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

most difficult instances of complex trade-offs occurs in a register-memory architecture in

deciding how many times a variable should be referenced before it is cheaper to load it into

a register. This threshold is hard to compute and, in fact, may vary among models of the

same architecture.

■ Provide instructions that bind the quantities known at compile time as constants —A compiler

writer hates the thought of the processor interpreting at runtime a value that was known

at compile time. Good counterexamples of this principle include instructions that interpret

values that were fixed at compile time. For instance, the VAX procedure call instruction

( calls ) dynamically interprets a mask saying what registers to save on a call, but the mask

is fixed at compile time (see Section A.10 ) .

Compiler Support (Or Lack Thereof) For Multimedia

Instructions

Alas, the designers of the SIMD instructions (see Section 4.3 in Chapter 4 ) basically ignored the

previous subsection. These instructions tend to be solutions, not primitives; they are short of

registers; and the data types do not match existing programming languages. Architects hoped

to find an inexpensive solution that would help some users, but often only a few low-level

graphics library routines use them.

The SIMD instructions are really an abbreviated version of an elegant architecture style that

has its own compiler technology. As explained in Section 4.2 , vector architectures operate on

vectors of data. Invented originally for scientific codes, multimedia kernels are often vector-

izable as well, albeit often with shorter vectors. As Section 4.3 suggests, we can think of In-

tel's MMX and SSE or PowerPC's AltiVec as simply short vector computers: MMX with vec-

tors of eight 8-bit elements, four 16-bit elements, or two 32-bit elements, and AltiVec with vec-

tors twice that length. They are implemented as simply adjacent, narrow elements in wide re-

gisters.

These microprocessor architectures build the vector register size into the architecture: the

sum of the sizes of the elements is limited to 64 bits for MMX and 128 bits for AltiVec. When

Intel decided to expand to 128-bit vectors, it added a whole new set of instructions, called

Streaming SIMD Extension (SSE).

A major advantage of vector computers is hiding latency of memory access by loading many

elements at once and then overlapping execution with data transfer. The goal of vector ad-

dressing modes is to collect data scatered about memory, place them in a compact form so

that they can be operated on efficiently, and then place the results back where they belong.

Vector computers include strided addressing and gather/scater addressing (see Section 4.2 ) to

increase the number of programs that can be vectorized. Strided addressing skips a fixed num-

ber of words between each access, so sequential addressing is often called unit stride address-

ing . Gather and scater ind their addresses in another vector register: Think of it as register

indirect addressing for vector computers. From a vector perspective, in contrast, these short-

vector SIMD computers support only unit strided accesses: Memory accesses load or store all

elements at once from a single wide memory location. Since the data for multimedia applic-

ations are often streams that start and end in memory, strided and gather/scatter addressing

modes are essential to successful vectorization (see Section 4.7 ) .

Search WWH ::

Custom Search

Home