The rounding rule of the conventional floating-point operations is strictly defined
by an ANSI/IEEE 754 floating-point standard. The rule is to keep accurate value
before rounding. However, each instruction performs the rounding, and the accumu-
lated rounding error sometimes becomes very serious. Therefore, a program must
avoid such a serious rounding error without relying to hardware if necessary. The
sequence of one FMUL and three FMACs can also cause a serious rounding error.
For example, the following formula results in zero, if we add the terms in the order
of the formula by FADD instructions:
However, the exact value is
, and the error is
2 − times of the maximum term.
We can get the exact value if we change the operation order properly. The floating-
point standard defines the rule of each operation, but does not define the result of the
formula, and either of the result is fine for the conformance. Since the FIPR opera-
tion is not defined by the standard, we defined its maximum error as “
for the formula, which causes the worst error of
2 E − + round-
ing error of result” to make it better than or equal to the average and worst-case
errors of the equivalent sequence that conforms the standard, where E is the maxi-
mum exponent of the four products.
A length-4 vector transformation was also popular operation of a 3D graphics,
and a floating-point transform vector instruction (FTRV) was defined. It required 20
registers to specify the operands in a modification type definition. Therefore, the
defining formula is as follows, using a four-by-four matrix of all the back bank reg-
isters, XMTRX, and one of the four front-bank vector registers, FV0-FV3:
: 0, 4,8,12).
Since a 3D object consists of a lot of polygons expressed by the length-4 vectors,
and the same XMTRX is applied to a lot of the vectors of a 3D object, the XMTRX
is not so often changed and suitable for using the back bank.
The FTRV operation was implemented as four inner-product operations by divid-
ing the XMTRX into four vectors properly, and its maximum error is the same as the
FIPR. It could be replaced by four inner-product instructions if we made input and
output registers different to keep the input value properly during the transformation.
The formula would become as follows:
XF12) · FV
XF13) · FV
XF14) · FV
XF15) · FV
The above inner-product operations were different from that of the FIPR in the
register usage, and we could define another inner-product instruction to fit the above