Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

EXCH R2,0(R1)

BNEZ R2, lockit

Assume that processors P0, P1, and P3 are all trying to acquire a lock at address 0x100 (i.e.,

register R1 holds the value 0x100). Assume the cache contents from Figure 5.35 and the

timing parameters from Implementation 1 in Figure 5.36 . For simplicity, assume that the

critical sections are 1000 cycles long.

a. [20] <5.5> Using the simple spin lock, determine approximately how many memory stall

cycles each processor incurs before acquiring the lock.

b. [20] <5.5> Using the optimized spin lock, determine approximately how many memory

stall cycles each processor incurs before acquiring the lock.

c. [20] <5.5> Using the simple spin lock, approximately how many interconnect transac-

tions occur?

d. [20] <5.5> Using the test-and-test-and-set spin lock, approximately how many intercon-

nect transactions occur?

5.8 [20/20/20/20] <5.6> Sequential consistency (SC) requires that all reads and writes appear

to have executed in some total order. This may require the processor to stall in certain cases

before commiting a read or write instruction. Consider the following code sequence:

write A

read B

where the write A results in a cache miss and the read B results in a cache hit. Under SC,

the processor must stall read B until after it can order (and thus perform) write A. Simple

implementations of SC will stall the processor until the cache receives the data and can

perform the write. Weaker consistency models relax the ordering constraints on reads and

writes, reducing the cases that the processor must stall. The Total Store Order (TSO) con-

sistency model requires that all writes appear to occur in a total order but allows a pro-

cessor's reads to pass its own writes. This allows processors to implement write buffers

that hold commited writes that have not yet been ordered with respect to other processors'

writes. Reads are allowed to pass (and potentially bypass) the write buffer in TSO (which

they could not do under SC). Assume that one memory operation can be performed per

cycle and that operations that hit in the cache or that can be satisfied by the write buffer

introduce no stall cycles. Operations that miss incur the latencies listed in Figure 5.36 . As-

sume the cache contents of Figure 5.35 . How many stall cycles occur prior to each operation

for both the SC and TSO consistency models?

a. [20] <5.6> P0: write 110 <-- 80

P0: read 108

b. [20] <5.6> P0: write 100 <-- 80

P0: read 108

c. [20] <5.6> P0: write 110 <-- 80

P0: write 100 <-- 90

d. [20] <5.6> P0: write 100 <-- 80

P0: write 110 <-- 90

Case Study 2: Simple Directory-Based Coherence

Concepts illustrated by this case study

■ Directory Coherence Protocol Transitions

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home