Hardware Reference
In-Depth Information
Vector-A
Vector-B
ID
MSB MSB MSB MSB MSB MSB MSB MSB
Adj. Adj. Adj. Adj. Adj. Adj. Adj. Adj.
Exp.
Adder0
Exp.
Adder1
Exp.
Adder2
Exp.
Adder3
Multiplier
Array 0
Multiplier
Array 1
Multiplier
Array 2
Multiplier
Array 3
Exp.
Diff.
01
Exp.
Diff.
02
Exp.
Diff.
03
Exp.
Diff.
12
Exp.
Diff.
13
Exp.
Diff.
23
E0
Max. Exp.
CPA0
CPA1
CPA2
CPA3
MUX0
MUX1MUX2MUX3 EMUX
Dec.
Dec.
Dec.
Dec.
Aligner 0
Aligner 1
Aligner 2
Aligner 3
EX
4-to-2 Reduction Array
VEC output (Exponent)
VEC output (Mantissa)
Fig. 3.23
Structure of FPU VEC block
At the E0 stage, Multiplier Arrays 0-3 and Exp. Adders 0-3 produce the mantissas
and exponents of the four intermediate products, respectively. Since the FIPR and
FTRV definitions allow the error of “
2 E + rounding error of result,” the multipliers
need not to produce an accurate value, and we can make smaller multiplier allowing
the error by eliminating the lower bit calculations properly. Then, Exp. Diffs. 01, 02,
03, 12, 13, and 23 generate all the six combinations of the exponent differences,
Max. Exp. judges the maximum exponent from the signs of the six differences, and
MUX0-3 select four differences from the six ones or zero to align the mantissas to
the mantissa of the maximum exponent product. The zero is selected for the maxi-
mum exponent one. Further, EMUX selects the maximum exponent as an exponent
of the VEC output.
At the EX stage, Aligners 0-3 align the mantissas by the four selected differ-
ences. Each difference can be positive or negative depending on what is the maxi-
mum exponent product, but the shift direction for the alignment is always right, and
proper adjustment is done when the difference is decoded. A 4-to-2 Reduction
Array reduces the four aligned mantissas into two as sum and carry of the mantissa
of the VEC output. The VEC output is received by MAIN block at the MUX of the
EX stage.
The vector instructions of FIPR and FTRV were defined as optional instructions,
and the hardware should be optimized for the configuration without the optional
instructions. Further, if we optimized hardware for all of the instructions, we cannot
share hardware properly because of the latency difference of FIPR and FTRV to the
others. Therefore, the E0 stage is inserted only when FIPR and FTRV are executed
with variable length pipeline architecture, although it causes one-cycle stall when
an FE-category instruction other than FIPR and FTRV is issued right after an FIPR
or an FTRV as illustrated in Fig. 3.24 .
25
 
Search WWH ::




Custom Search