Cryptography Reference
In-Depth Information
of the operands. Therefore, the column-wise algorithm should be preferred, as memory
writes only occur at the foot of each column. A disadvantage of both methods is that
they reload the same operands many times to calculate different partial products. This
is obviously not an optimal approach, especially if a processor has enough registers
available to store some of these values for later use.
It has recently been suggested that a hybrid method, which combines features of
both techniques, might be preferable, particularly in a setting where the processor has
many available general-purpose registers (Gura et al. 2004). This method was first
described in the context of the ATmega-128L 8-bit processor, which has 32 registers.
In the same setting, Uhsadel et al. (2007) have presented better results. The following
sections present a description of an optimized method of hybrid multiplication that
simplifies the whole approach. This improved technique can be applied to a wide range
of embedded processors and is superior to the earlier proposals.
9.8.1.3 Improved Hybrid Multiplication
Most efficient implementations of multiprecision multiplication use the Comba method,
which is based on the column-wise technique (Figure 9.3; Comba 1990). Each W -bit
pair of digits is multiplied together to create one 2 W- bit partial product. This partial
product is accumulated in a triple register. The third register (called a “carry-catcher”)
is required to catch the carries that can arise as the full column is added up. After the
column addition, the least significant register of the triple register is written to memory
as a part of the result, and the other two registers shift down, representing the carry to
the next column. The maximum number that can arise in the third and most significant
register is bounded by the number of digits in the multiprecision multiplication. This is
likely to be much smaller than 2 W - 1, which is the maximum number that can be stored
in a register. A smaller register can, therefore, sometimes be used here.
The hybrid method maintains the advantages of the column-wise method while
exploiting any extra registers to avoid unnecessary reloads of data (Gura et al. 2004).
The idea is straightforward: perform the multiplication as if the word length of the
computer were actually d.W , and perform the d×d multiplication that arises in the
calculation of the larger partial product using the row-wise algorithm. Figure 9.5
illustrates the hybrid method for the case d = 2. Four word-size integers are loaded
into registers. As each large 2 × 2 partial product is calculated (represented by the large
outer boxes), it must be accumulated into registers. This figure illustrates the accumu-
lation of a particular partial product, and curved arrows indicate the carries. A naive
implementation would require five registers in this case to store the sum. However,
in the worst case this might require up to five add-with-carry instructions because,
as in integer addition, a carry-out is always a possibility that must be considered. The
original hybrid method (Gura et al. 2004), as well as the improved variant described
in Uhsadel et al. (2007), use a rather complex method to deal with this carry propaga-
tion problem based on the fact that the product of two words, plus two words, can-
not overflow a double-word-sized register since (2 W - 1)(2 W - 1) + (2 W - 1) + (2 W - 1) =
2 2 W - 1 < 2 2 W .
Search WWH ::




Custom Search