Processor Cores - Heterogeneous Multicore Processor Technologies for Embedded Systems - page 50

Hardware Reference

In-Depth Information

Vector-A

Vector-B

ID

MSB MSB MSB MSB MSB MSB MSB MSB

Adj. Adj. Adj. Adj. Adj. Adj. Adj. Adj.

Exp.

Adder0

Exp.

Adder1

Exp.

Adder2

Exp.

Adder3

Multiplier

Array 0

Multiplier

Array 1

Multiplier

Array 2

Multiplier

Array 3

Exp.

Diff.

01

Exp.

Diff.

02

Exp.

Diff.

03

Exp.

Diff.

12

Exp.

Diff.

13

Exp.

Diff.

23

E0

Max. Exp.

CPA0

CPA1

CPA2

CPA3

MUX0

MUX1MUX2MUX3 EMUX

Dec.

Dec.

Dec.

Dec.

Aligner 0

Aligner 1

Aligner 2

Aligner 3

EX

4-to-2 Reduction Array

VEC output (Exponent)

VEC output (Mantissa)

Fig. 3.23

Structure of FPU VEC block

At the E0 stage, Multiplier Arrays 0-3 and Exp. Adders 0-3 produce the mantissas

and exponents of the four intermediate products, respectively. Since the FIPR and

FTRV definitions allow the error of “

2 E − + rounding error of result,” the multipliers

need not to produce an accurate value, and we can make smaller multiplier allowing

the error by eliminating the lower bit calculations properly. Then, Exp. Diffs. 01, 02,

03, 12, 13, and 23 generate all the six combinations of the exponent differences,

Max. Exp. judges the maximum exponent from the signs of the six differences, and

MUX0-3 select four differences from the six ones or zero to align the mantissas to

the mantissa of the maximum exponent product. The zero is selected for the maxi-

mum exponent one. Further, EMUX selects the maximum exponent as an exponent

of the VEC output.

At the EX stage, Aligners 0-3 align the mantissas by the four selected differ-

ences. Each difference can be positive or negative depending on what is the maxi-

mum exponent product, but the shift direction for the alignment is always right, and

proper adjustment is done when the difference is decoded. A 4-to-2 Reduction

Array reduces the four aligned mantissas into two as sum and carry of the mantissa

of the VEC output. The VEC output is received by MAIN block at the MUX of the

EX stage.

The vector instructions of FIPR and FTRV were defined as optional instructions,

and the hardware should be optimized for the configuration without the optional

instructions. Further, if we optimized hardware for all of the instructions, we cannot

share hardware properly because of the latency difference of FIPR and FTRV to the

others. Therefore, the E0 stage is inserted only when FIPR and FTRV are executed

with variable length pipeline architecture, although it causes one-cycle stall when

an FE-category instruction other than FIPR and FTRV is issued right after an FIPR

or an FTRV as illustrated in Fig. 3.24 .

25

Next Page

Heterogeneous Multicore Processor Technologies for Embedded Systems

Search WWH ::

Custom Search

Home