Hardware Reference
In-Depth Information
only the front bank, but some newly defined instructions use both the front and back
banks. The SH-4 uses the front-bank registers as eight pairs or four length-4 vectors
as well as 16 registers and uses the back-bank registers as eight pairs or a four-by-
four matrix. They were defined as follows:
DR
n
=
(FR , FR [ +1]) (
n
n
n
: 0,2, 4,6,8,10,12,14),
FR0
FR1
FR4
FR5
FR8
FR9
FR12
FR13
FV0
=
FV4
=
FV8
=
FV12
=
,
,
,
,
FR2
FR3
FR6
FR7
FR10
FR11
FR14
FR15
XD
n
=
(XF , XF [
n
n
+
1]) (
n
: 0, 2, 4, 6,8,10,12,14),
XF0
XF4
XF8
XF12
XF1
XF5
XF9
XF13
XMTRX
=
XF2
XF6
XF10
XF14
XF3
XF7
XF11
XF15
Since an ordinary SIMD extension of an FPU was too expensive for an embedded
processor as described above, another parallelism was applied to the SH-4. The large
hardware of an FPU is for a mantissa alignment before the operation and normaliza-
tion and rounding after the operation. Further, a popular FPU instruction, FMAC,
requires three read and one write ports. The consecutive FMAC operations are a
popular sequence to accumulate plural products. For example, an inner product of
two length-4 vectors is one of such sequences and popular in a 3D graphics pro-
gram. Therefore, a floating-point inner-product instruction (FIPR) was defined to
accelerate the sequence with smaller hardware than that for the SIMD. It uses the
two of four length-4 vectors as input operand and modifies the last register of one of
the input vectors to store the result. The defining formula is as follows:
(
)
FR [
n
+=
3]
FV
m
×
FV
n
m n
,
: 0, 4,8,12 .
This modifying-type definition is similar to the other instructions. However, for
a length-3 vector operation, which is also popular, you can get the result without
destroying the inputs, by setting one of forth elements of the input vectors to zero.
The FIPR produces only one result, which is one fourth of a four-way SIMD, and
can save the normalization and rounding hardware. It requires eight input and one
output registers, which are less than the 12 input and four output registers for a four-
way SIMD FMAC. Further, the FIPR takes much shorter time than the equivalent
sequence of one FMUL and three FMACs and requires small number of registers to
sustain the peak performance. As a result, the hardware was estimated to be half of
the four-way SIMD.
Search WWH ::




Custom Search