Hardware Reference
In-Depth Information
Example
As an example, compare a vector computer to MMX for color representation
conversion of pixels from RGB (red, green, blue) to YUV (luminosity chromin-
ance), with each pixel represented by 3 bytes. The conversion is just three lines
of C code placed in a loop:
Y = (9798*R + 19235*G + 3736*B) / 32768;
U = (-4784*R - 9437*G + 4221*B) / 32768 + 128;
V = (20218*R - 16941*G - 3277*B) / 32768 + 128;
A 64-bit-wide vector computer can calculate 8 pixels simultaneously. One
vector computer for media with strided addresses takes
■ 3 vector loads (to get RGB)
■ 3 vector multiplies (to convert R)
■ 6 vector multiply adds (to convert G and B)
■ 3 vector shifts (to divide by 32,768)
■ 2 vector adds (to add 128)
■ 3 vector stores (to store YUV)
The total is 20 instructions to perform the 20 operations in the previous C
code to convert 8 pixels [ Kozyrakis 2000 ] . (Since a vector might have 32 64-bit
elements, this code actually converts up to 32 × 8 or 256 pixels.)
In contrast, Intel's Web site shows that a library routine to perform the same
calculation on 8 pixels takes 116 MMX instructions plus 6 80×86 instructions [ In-
tel 2001 ] . This sixfold increase in instructions is due to the large number of in-
structions to load and unpack RGB pixels and to pack and store YUV pixels,
since there are no strided memory accesses.
Having short, architecture-limited vectors with few registers and simple memory address-
ing modes makes it more difficult to use vectorizing compiler technology. Hence, these SIMD
instructions are more likely to be found in hand-coded libraries than in compiled code.
Summary: The Role Of Compilers
This section leads to several recommendations. First, we expect a new instruction set architec-
ture to have at least 16 general-purpose registers—not counting separate registers for loating-
point numbers—to simplify allocation of registers using graph coloring. The advice on ortho-
gonality suggests that all supported addressing modes apply to all instructions that transfer
data. Finally, the last three pieces of advice—provide primitives instead of solutions, simplify
trade-ofs between alternatives, don't bind constants at runtime—all suggest that it is better
to err on the side of simplicity. In other words, understand that less is more in the design of
an instruction set. Alas, SIMD extensions are more an example of good marketing than of out-
standing achievement of hardware-software co-design.
Search WWH ::




Custom Search