Hardware Reference
In-Depth Information
*(tiPR[AA]*clR[A] + tiPR[AC]*clR[C] + tiPR[AG]*clR[G] + tiPR[AT]*clR[T]);
clP[h++] = (tiPL[CA]*clL[A] + tiPL[CC]*clL[C] + tiPL[CG]*clL[G] + tiPL[CT]*clL[T])
*(tiPR[CA]*clR[A] + tiPR[CC]*clR[C] + tiPR[CG]*clR[G] + tiPR[CT]*clR[T]);
clP[h++] = (tiPL[GA]*clL[A] + tiPL[GC]*clL[C] + tiPL[GG]*clL[G] + tiPL[GT]*clL[T])
*(tiPR[GA]*clR[A] + tiPR[GC]*clR[C] + tiPR[GG]*clR[G] + tiPR[GT]*clR[T]);
clP[h++] = (tiPL[TA]*clL[A] + tiPL[TC]*clL[C] + tiPL[TG]*clL[G] + tiPL[TT]*clL[T])
*(tiPR[TA]*clR[A] + tiPR[TC]*clR[C] + tiPR[TG]*clR[G] + tiPR[TT]*clR[T]);
clL += 4;
clR += 4;
tiPL += 16;
tiPR += 16;
}
4.1 [25] <4.2, 4.3> Assume the constants shown in Figure 4.32 . Show the code for MIPS and
VMIPS. Assume we cannot use scater-gather loads or stores. Assume the starting ad-
dresses of tiPL , tiPR , clL , clR , and clP are in RtiPL , RtiPR , RclL , RclR , and RclP , respectively. As-
sume the VMIPS register length is user programmable and can be assigned by seting the
special register VL (e.g., li VL 4). To facilitate vector addition reductions, assume that we
add the following instructions to VMIPS:
SUMR.S Fd, Vs Vector Summation Reduction Single Precision:
This instruction performs a summation reduction on a vector register Vs , writing to the sum
into scalar register Fd .
FIGURE 4.32 Constants and values for the case study .
4.2 [5] <4.2, 4.3> Assuming seq_length == 500 , what is the dynamic instruction count for both
implementations?
4.3 [25] <4.2, 4.3> Assume that the vector reduction instruction is executed on the vector func-
tional unit, similar to a vector add instruction. Show how the code sequence lays out in
convoys assuming a single instance of each vector functional unit. How many chimes will
the code require? How many cycles per FLOP are needed, ignoring vector instruction issue
overhead?
4.4 [15] <4.2, 4.3> Now assume that we can use scater-gather loads and stores ( LVI and SVI ).
Assume that tiPL , tiPR , clL , clR , and clP are arranged consecutively in memory. For example,if
if seq_length==500 , the tiPR array would begin 500 * 4 bytes after the tiPL array. How does this
affect the way you can write the VMIPS code for this kernel? Assume that you can initial-
ize vector registers with integers using the following technique which would, for example,if
initialize vector register V1 with values (0,0,2000,2000):
 
Search WWH ::




Custom Search