Hardware Reference
In-Depth Information
The multiply unit handles both integer and floating-point multiplications. The
next three units handle floating-point additions/subtractions, compares, and square
roots and divisions, respectively.
Branch operations are executed by the branch unit. There is a fixed 3-cycle
delay after a branch, so the three instructions (up to 15 operations) following a
branch are always executed, even for unconditional branches.
Finally, we come to the two multimedia units, which handle the special multi-
media operations. The DSP in the name of the functional unit refers to Digital
Signal Processor . which the multimedia operations effectively replace. We will
describe the multimedia operations briefly below. One noteworthy feature is that
they all use saturated arithmetic instead of two's complement arithmetic used by
the integer operations. When an operation produces a result that cannot be
expressed due to overflow, instead of generating an exception or giving a garbage
result, the closest valid number is used. For example, with 8-bit unsigned num-
bers, adding 130 and 130 gives 255.
Because not every operation can appear in every slot, often an instruction does
not contain all five potential operations. When a slot is not used, it is compacted to
minimize the amount of space wasted. Operations that are present occupy 26, 34,
or 42 bits. Depending on the number of operations actually present, TriMedia in-
structions vary from 2 to 28 bytes, including some fixed overhead.
The TriMedia does not make run-time checks to see whether the operations in
an instruction are compatible. If they are not, it just executes them anyway and
gets the wrong answer. Leaving the check out was a deliberate decision to save
time and transistors. The Core i7 does do run-time checking to make sure all the
superscalar operations are compatible, but at a huge cost in complexity, time, and
transistors. The TriMedia avoids this expense by putting the burden of scheduling
on the compiler, which has all the time in the world to carefully optimize the place-
ment of operations in instruction words. On the other hand, if an operation needs a
functional unit that is not available, the instruction will stall until it becomes avail-
able.
As in the Itanium-2, TriMedia operations are predicated. Each operation (with
two minor exceptions) specifies a register that is tested before the operation is ex-
ecuted. If the low-order bit of the register is set, the operation is executed; other-
wise, it is skipped. Each of the (up to) five operations is individually predicated.
An example of a predicated operation is
IF R2 IADD R4, R5 -> R8
which tests R2 and, if the low-order bit is 1, adds R4 to R5 and stores the result in
R8 . An operation can be made unconditional by using R1 (which is always 1) as
the predicate register. Using R0 (which is always 0) makes it a no-op.
The TriMedia multimedia operations can be grouped into the 15 groups listed
in Fig. 8-5. Many of the operations involve clipping, which specifies an operand
 
Search WWH ::




Custom Search