Hardware Reference
In-Depth Information
FIGURE 4.7 Level of vectorization among the Perfect Club benchmarks when executed
on the Cray Y-MP [ Vajapeyam 1991 ] . The first column shows the vectorization level obtained
with the compiler without hints, while the second column shows the results after the codes
have been improved with hints from a team of Cray Research programmers.
The hint-rich versions show significant gains in vectorization level for codes the compiler
could not vectorize well by itself, with all codes now above 50% vectorization. The median
vectorization improved from about 70% to about 90%.
4.3 SIMD Instruction Set Extensions for Multimedia
SIMD Multimedia Extensions started with the simple observation that many media applic-
ations operate on narrower data types than the 32-bit processors were optimized for. Many
graphics systems used 8 bits to represent each of the three primary colors plus 8 bits for trans-
parency. Depending on the application, audio samples are usually represented with 8 or 16
bits. By partitioning the carry chains within, say, a 256-bit adder, a processor could perform
simultaneous operations on short vectors of thirty-two 8-bit operands, sixteen 16-bit operands,
eight 32-bit operands, or four 64-bit operands. The additional cost of such partitioned adders
was small. Figure 4.8 summarizes typical multimedia SIMD instructions. Like vector instruc-
tions, a SIMD instruction specifies the same operation on vectors of data. Unlike vector ma-
chines with large register files such as the VMIPS vector register, which can hold as many as
sixty-four 64-bit elements in each of 8 vector registers, SIMD instructions tend to specify fewer
operands and hence use much smaller register files.
 
 
Search WWH ::




Custom Search