Hardware Reference
In-Depth Information
Instruction
Clock Cycles
All ALU instructions
1.0
Loads-stores
1.4
Conditional branches:
Taken
2.0
Not taken
1.5
Jumps
1.2
FP multiply
6.0
FP add
4.0
FP divide
20.0
Load-store FP
1.5
Other FP
2.0
Assume that 60% of the conditional branches are taken and that all instructions in the “other”
category of Figure A.28 are ALU instructions. Average the instruction frequencies of lucas and
swim to obtain the instruction mix.
A.4 [20] <A.9> Compute the effective CPI for MIPS using Figure A.28 and the table above.
Average the instruction frequencies of applu and art to obtain the instruction mix.
A.5 [10] <A.8> Consider this high-level code sequence of three statements:
A = B + C;
B = A + C;
D = A - B;
Use the technique of copy propagation (see Figure A.20 ) to transform the code sequence
to the point where no operand is a computed value. Note the instances in which the trans-
formation has reduced the computational work of a statement and those cases where the
work has increased. What does this suggest about the technical challenge faced in trying to
satisfy the desire for optimizing compilers?
A.6 [30] <A.8> Compiler optimizations may result in improvements to code size and/or per-
formance. Consider one or more of the benchmark programs from the SPEC CPU2006
suite. Use a processor available to you and the GNU C compiler to optimize the program
using no optimization, -O1, -O2, and -O3. Compare the performance and size of the res-
ulting programs. Also compare your results to Figure A.21 .
A.7 [20/20] <A.2, A.9> Consider the following fragment of C code:
for (i = 0; i <= 100; i++)
{ A[i] = B[i] + C; }
Assume that A and B are arrays of 64-bit integers, and C and i are 64-bit integers. Assume
that all data values and their addresses are kept in memory (at addresses 1000, 3000, 5000,
and 7000 for A , B , C , and i , respectively) except when they are operated on. Assume that val-
ues in registers are lost between iterations of the loop.
Search WWH ::




Custom Search