Accelerating Data Race Detection Utilizing On-Chip Data-Parallel Cores - Runtime Verification

Information Technology Reference

In-Depth Information

These are the potential data race accesses and we call such addresses shared-modified .

Since the impact of ST instructions on accuracy is very low, we do not apply any fil-

tering on them. By filtering out innocuous LD instructions, we are able to bring down

the false positive rate for GUARD without any negative impact on performance or data

race detection capability.

We consider a memory hierarchy design with private L1 caches and a shared LLC

which is common in current multicore processors. When the filtering mechanism is

enabled, SG monitors the data response message from the LLC to check the shared-

modified state. If the data was written to by another thread and is in modified state in

the LLC, the shared-modified state is set by the LLC controller. When the state is set,

SG concludes that this is a potential data race candidate and adds the address to the RD

signature. Otherwise, the address is filtered out. The filtering mechanism considers the

following three scenarios:

- L1 Hit: When a LD instruction hits the L1 data cache, the data is either private

or shared read-only. Such an access will not cause a data race, and hence it is

considered safe and the address is filtered out.

- L1 Miss & LLC Hit: When a LD instruction misses L1 data cache and hits the

shared LLC, the LLC controller uses the coherence information to identify the state

of the address. If the address was in a modified state prior to the load request, it was

written to by another thread recently. Hence, this address is considered shared-

modified and the corresponding bit is set in the response message.

- LLC Miss: If the access misses the shared LLC, it is potentially a cold miss or an

access to the address after a long interval. Such accesses are considered safe as they

will not cause a data race. Hence, the LLC controller resets the shared-modified bit

and the address is filtered out.

These scenarios, however, could still experience a situation where the access could

lead to a data race condition. In a potential write-after-read (WAR) race condition sce-

nario, when the read instruction occurs at first, there is not enough information to make

a decision on filtering. However, a future write to the same memory location by another

thread in a concurrent epoch results in a potential data race. Hence, if this LD instruc-

tion was filtered out due to insufficient information, a potential WAR race condition

could be missed.

This issue can be addressed by using temporary hardware signatures. For every thread,

the filtered LD addresses from the current epoch are compressed and stored in tempo-

rary signatures. When a ST occurs (rather infrequent) in a thread, LLC controller sends

invalidation messages to sharers and the cache line is set to modified state. The SG in

these sharers compare the address in the invalidation message with the addresses in their

temporary RD signature, and if there is a match, the address is added back to the thread's

RD signature. However, only the addresses from the current epoch could be saved as the

previous epochs would have already been dispatched to the GPU for data race detection.

Also, limited capacity of the LLC or time gap between the two instructions could lead

to the related cache line being evicted from the shared LLC. The scheme will then fil-

ter out the LD instruction, due to lack of information in the LLC. However, it should be

emphasized here that the most crucial data race accesses are the ones that occur in close

proximity, and those are unlikely to be filtered out due to this limitation.

Runtime Verification

Search WWH ::

Custom Search

Home