Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

FIGURE 3.48 Code and latencies for Exercises 3.1 through 3.6 .

3.1 [10] <1.8, 3.1, 3.2> What would be the baseline performance (in cycles, per loop iteration)

of the code sequence in Figure 3.48 if no new instruction's execution could be initiated until

the previous instruction's execution had completed? Ignore front-end fetch and decode.

Assume for now that execution does not stall for lack of the next instruction, but only one

instruction/cycle can be issued. Assume the branch is taken, and that there is a one-cycle

branch delay slot.

3.2 [10] <1.8, 3.1, 3.2> Think about what latency numbers really mean—they indicate the num-

ber of cycles a given function requires to produce its output, nothing more. If the overall

pipeline stalls for the latency cycles of each functional unit, then you are at least guaran-

teed that any pair of back-to-back instructions (a “producer” followed by a “consumer”)

will execute correctly. But not all instruction pairs have a producer/consumer relationship.

Sometimes two adjacent instructions have nothing to do with each other. How many cycles

would the loop body in the code sequence in Figure 3.48 require if the pipeline detected

true data dependences and only stalled on those, rather than blindly stalling everything

just because one functional unit is busy? Show the code with < stall > inserted where neces-

sary to accommodate stated latencies. ( Hint : An instruction with latency +2 requires two

< stall > cycles to be inserted into the code sequence. Think of it this way: A one-cycle in-

struction has latency 1 + 0, meaning zero extra wait states. So, latency 1 + 1 implies one stall

cycle; latency 1 + N has N extra stall cycles.

3.3 [15] <3.6, 3.7> Consider a multiple-issue design. Suppose you have two execution

pipelines, each capable of beginning execution of one instruction per cycle, and enough

fetch/decode bandwidth in the front end so that it will not stall your execution. Assume

results can be immediately forwarded from one execution unit to another, or to itself. Fur-

ther assume that the only reason an execution pipeline would stall is to observe a true data

dependency. Now how many cycles does the loop require?

3.4 [10] <3.6, 3.7> In the multiple-issue design of Exercise 3.3 , you may have recognized

some subtle issues. Even though the two pipelines have the exact same instruction reper-

toire, they are neither identical nor interchangeable, because there is an implicit ordering

between them that must reflect the ordering of the instructions in the original program. If

instruction N + 1 begins execution in Execution Pipe 1 at the same time that instruction N

Search WWH ::

Custom Search

Home