Hardware Reference
In-Depth Information
EXCH R2,0(R1)
BNEZ R2, lockit
Assume that processors P0, P1, and P3 are all trying to acquire a lock at address 0x100 (i.e.,
register R1 holds the value 0x100). Assume the cache contents from Figure 5.35 and the
timing parameters from Implementation 1 in Figure 5.36 . For simplicity, assume that the
critical sections are 1000 cycles long.
a. [20] <5.5> Using the simple spin lock, determine approximately how many memory stall
cycles each processor incurs before acquiring the lock.
b. [20] <5.5> Using the optimized spin lock, determine approximately how many memory
stall cycles each processor incurs before acquiring the lock.
c. [20] <5.5> Using the simple spin lock, approximately how many interconnect transac-
tions occur?
d. [20] <5.5> Using the test-and-test-and-set spin lock, approximately how many intercon-
nect transactions occur?
5.8 [20/20/20/20] <5.6> Sequential consistency (SC) requires that all reads and writes appear
to have executed in some total order. This may require the processor to stall in certain cases
before commiting a read or write instruction. Consider the following code sequence:
write A
read B
where the write A results in a cache miss and the read B results in a cache hit. Under SC,
the processor must stall read B until after it can order (and thus perform) write A. Simple
implementations of SC will stall the processor until the cache receives the data and can
perform the write. Weaker consistency models relax the ordering constraints on reads and
writes, reducing the cases that the processor must stall. The Total Store Order (TSO) con-
sistency model requires that all writes appear to occur in a total order but allows a pro-
cessor's reads to pass its own writes. This allows processors to implement write buffers
that hold commited writes that have not yet been ordered with respect to other processors'
writes. Reads are allowed to pass (and potentially bypass) the write buffer in TSO (which
they could not do under SC). Assume that one memory operation can be performed per
cycle and that operations that hit in the cache or that can be satisfied by the write buffer
introduce no stall cycles. Operations that miss incur the latencies listed in Figure 5.36 . As-
sume the cache contents of Figure 5.35 . How many stall cycles occur prior to each operation
for both the SC and TSO consistency models?
a. [20] <5.6> P0: write 110 <-- 80
P0: read 108
b. [20] <5.6> P0: write 100 <-- 80
P0: read 108
c. [20] <5.6> P0: write 110 <-- 80
P0: write 100 <-- 90
d. [20] <5.6> P0: write 100 <-- 80
P0: write 110 <-- 90
Case Study 2: Simple Directory-Based Coherence
Concepts illustrated by this case study
■ Directory Coherence Protocol Transitions
Search WWH ::




Custom Search