Hardware Reference
In-Depth Information
FTRV, the out-of-order completions of FDIV and FSQRT, and proper exten-
sions of the register files and load/store/transfer width. Further parallelization
could be one of the next approaches, but we took another approach to enhance
the operating frequency. Main reason was that the CPU side had to take this
approach for the general applications with low parallelism as described in
Sect. 3.1.2 . However, it caused serious performance degradation to allow 1.5
times long latencies of the FPU instructions. Therefore, we enhanced the archi-
tecture and microarchitecture to reduce the latencies efficiently.
3.1.6.1
Floating-Point Architecture Extension
The FDIV and FSQRT of the SH-4 were already long latency instructions, and the
1.5 times long latencies of the SH-X could cause serious performance degradations.
The long latencies were mainly from the strict operation definitions by the ANSI/
IEEE 754 floating-point standard. We had to keep accurate value before rounding.
However, there was another way if we allowed proper inaccuracies.
A floating-point square-root reciprocal approximate (FSRRA) was defined as an
elementary function instruction to replace the FDIV, FSQRT, or their combination.
Then we do not need to use the long latency instructions. Especially, 3D graphics
applications require a lot of reciprocal and square-root reciprocal values, and the
FSRRA is highly effective. Further, 3D graphics require less accuracy, and the sin-
gle precision without strict rounding is enough accuracy. The maximum error of the
FSRRA is
2 E
21
where E is the exponent value of an FSRRA result. The FSRRA
definition is as follows:
±
1
FR
n
=
.
FR
n
A floating-point sine and cosine approximate (FSCA) was defined as another
popular elementary function instruction. Once the FSRRA was introduced, extra
hardware was not so large for the FSCA. The most popular definition of the trigo-
nometric function is to use radian for the angular unit. However, the period of the
radian is 2p and cannot be expressed by a simple binary number. Therefore, the
FSCA uses fixed-point number of rotations as the angular expression. The number
consists of 16-bit integer and 16-bit fraction parts. Then the integer part is not nec-
essary to calculate the sine and cosine values by their periodicity, and the 16-bit
fraction part can express enough resolution of 360/65,536 = 0.0055°. The angular
source operand is set to a CPU-FPU communication register FPUL because the
angular value is a fixed-point number. The maximum error of the FSCA is
2
22
,
which is an absolute value and not related to the result value. Then the FSCA
definition is as follows:
±
FR
n n
=
(2
π ⋅
FPUL), FR [
n
+
1]
=
C s
(2
π ⋅
FPUL
Search WWH ::




Custom Search