Hardware Reference
In-Depth Information
Fig. 3.31
Four-way SIMD
4-way SIM D
vs. FIPR
FMUL
FIPR
20 operations for
peak throughput
5 operations
FMAC
FMAC
FMAC
Result is
available here
Fig. 3.32 FSRRA vs.
equivalent sequence of
FSQRT and FDIV
FSQRT
FSRRA
4
11
(post process)
4
FDIV
11
(post process)
4
Result is available here
Fig. 3.33 FDIV vs.
equivalent sequence of
FSRRA and FMUL
FDIV
FSRRA
4
FMUL
11
4
post
process
Resource is
available here
4
Result is available here
five cycles that are only one-quarter and approximately one-fifth of those of the
equivalent sequences, respectively. The FSRRA is much faster using a similar
amount of the hardware resource.
The FSRRA can compute a reciprocal as shown in Fig. 3.33 . The FDIV occupies
2 and 13 cycles of the MAIN FPU and special resources, respectively, and takes 17
cycles to get the result. On the other hand, the FSRRA and FMUL sequence occu-
pies two and three cycles of the MAIN FPU and special resources, respectively, and
takes ten cycles to get the result. Therefore, the FSRRA and FMUL sequence is bet-
ter than using the FDIV if an application does not require a result conforming to the
IEEE standard, and 3D graphics are one of such applications.
 
Search WWH ::




Custom Search