Information Technology Reference
In-Depth Information
Effect on GPU Applications. GUARD shares the GPU computational power with
other GPU applications. Hence, while GUARD is enabled, other applications will have
less GPU resources available and their performance could suffer. GUARD, however, is
envisioned as a runtime tool that is exclusively used for debugging purposes and not
for continuous usage while other applications are utilizing GPU resources. Hence, the
impact of GUARD on the performance of other GPU applications is minimal.
Supporting Thread Migration and Simultaneous Multithreading. Thread migra-
tion in a multicore processor enables application threads to migrate from one core to
another. GUARD can support thread migration as the signature table entries correspond
to a thread, and are not tied to any particular core. When a thread migrates from a core,
the current signature is forcibly closed and transferred to the signature table for data
race detection. Additionally, GUARD can handle parallel applications utilizing more
number of threads than the number of cores present in the processor. Since the signa-
ture table is stored in memory, instead of dedicated hardware, GUARD is able to adapt
to the number of threads utilized by the application. This capability also lets GUARD
support simultaneous multithreading.
Hardware Support. In baseline GUARD, the only additional hardware support re-
quired is the SG. We build SG on top of well studied generic instruction-grain pro-
gram execution monitors [18, 19] that is used for efficient extraction of execution trace.
Bloom filter hardware is used to compress the extracted traces into signatures. Hard-
ware buffers are used to temporarily store the signature while an epoch is being created.
For a 2048-bit signature, combined RD/WR signature size will be 512 bytes per core.
6
Conclusions
As the integration of data-parallel accelerator cores onto the modern multicore pro-
cessor becomes common, it is desirable to be able to utilize this computing power
for enhancing non-performance aspects of parallel execution. Concurrency bug detec-
tion, particularly data race detection, assumes increased importance in the current land-
scape of parallel computing. In this paper, we design, implement, and evaluate a GPU
Accelerated Data Race Detector (GUARD). GUARD utilizes GPU cores available on-
chip to perform data race detection for the multithreaded applications running on the
CPU cores. The GPU cores are employed for data race detection when they are not be-
ing utilized for performance acceleration of applications. This paper proposes several
optimizations each allowing a different trade-off between performance and accuracy of
data race detection: (i) accelerating CPU data race detection utilizing available on-chip
data-parallel cores; (ii) compressing generated memory traces, using Bloom filters, to
drastically reduce the computational requirement; and (iii) filtering out innocuous mem-
ory accesses, using coherence state information, to improve the accuracy of signatures.
Using a single GPU core (SM architecture described in Section 4), GUARD per-
forms data race detection on a 4-core CPU with 1.8% performance overhead and 18.8%
false positive rate. Coherence-based filtering mechanism reduces the false positive rate
by nearly 75%, without missing any data race conditions. Furthermore, by scaling the
 
Search WWH ::




Custom Search