Processor Cores - Heterogeneous Multicore Processor Technologies for Embedded Systems

Hardware Reference

In-Depth Information

FTRV, the out-of-order completions of FDIV and FSQRT, and proper exten-

sions of the register files and load/store/transfer width. Further parallelization

could be one of the next approaches, but we took another approach to enhance

the operating frequency. Main reason was that the CPU side had to take this

approach for the general applications with low parallelism as described in

Sect. 3.1.2 . However, it caused serious performance degradation to allow 1.5

times long latencies of the FPU instructions. Therefore, we enhanced the archi-

tecture and microarchitecture to reduce the latencies efficiently.

3.1.6.1

Floating-Point Architecture Extension

The FDIV and FSQRT of the SH-4 were already long latency instructions, and the

1.5 times long latencies of the SH-X could cause serious performance degradations.

The long latencies were mainly from the strict operation definitions by the ANSI/

IEEE 754 floating-point standard. We had to keep accurate value before rounding.

However, there was another way if we allowed proper inaccuracies.

A floating-point square-root reciprocal approximate (FSRRA) was defined as an

elementary function instruction to replace the FDIV, FSQRT, or their combination.

Then we do not need to use the long latency instructions. Especially, 3D graphics

applications require a lot of reciprocal and square-root reciprocal values, and the

FSRRA is highly effective. Further, 3D graphics require less accuracy, and the sin-

gle precision without strict rounding is enough accuracy. The maximum error of the

FSRRA is

2 E −

21

where E is the exponent value of an FSRRA result. The FSRRA

definition is as follows:

±

1

FR

n

=

.

FR

n

A floating-point sine and cosine approximate (FSCA) was defined as another

popular elementary function instruction. Once the FSRRA was introduced, extra

hardware was not so large for the FSCA. The most popular definition of the trigo-

nometric function is to use radian for the angular unit. However, the period of the

radian is 2p and cannot be expressed by a simple binary number. Therefore, the

FSCA uses fixed-point number of rotations as the angular expression. The number

consists of 16-bit integer and 16-bit fraction parts. Then the integer part is not nec-

essary to calculate the sine and cosine values by their periodicity, and the 16-bit

fraction part can express enough resolution of 360/65,536 = 0.0055°. The angular

source operand is set to a CPU-FPU communication register FPUL because the

angular value is a fixed-point number. The maximum error of the FSCA is

2 −

22

,

which is an absolute value and not related to the result value. Then the FSCA

definition is as follows:

±

FR

n n

=

(2

π ⋅

FPUL), FR [

n

+

1]

=

C s

(2

π ⋅

FPUL

Heterogeneous Multicore Processor Technologies for Embedded Systems

Search WWH ::

Custom Search

Home