Information Technology Reference
In-Depth Information
Concurrency bug detection tools for applications executing on GPU architectures
have been proposed [14-16]. GUARD differs from these software-based mechanisms
as it targets data race detection for CPU application, by utilizing on-chip GPU cores. A
recently proposed work, KUDA [17], proposes to utilize GPU threads to improve the
performance of data race detection on CPU threads. GUARD, however, differs from
KUDA in several aspects. KUDA needs binary instrumentation and the help of addi-
tional CPU threads (worker threads) for the extraction of memory access trace. Ad-
ditionally, the memory trace compression technique employed by GUARD helps in
outperforming KUDA.
2.3
Instruction-Grain Program Monitoring
Instruction-grain program monitors have been proposed to efficiently extract runtime
information from the CPU. These tools monitor programs at an instruction-level gran-
ularity and collect information such as program counter, instruction type, input/output
operands, and access addresses. Such monitors have been used for specialized pur-
poses such as memory checking, security tracking, and taint analysis [4,6,18]. Runtime
data race detection requires extraction of memory access information from the CPU
cores while the parallel applications are executing. General purpose instruction-grain
program monitors such as Log-based Architecture [19] can efficiently extract runtime
information from the CPU without significant hardware modifications. Previously, we
have proposed utilizing hardware support for extracting runtime information for dy-
namic program execution monitoring [18]. GUARD utilizes a similar hardware extrac-
tion logic that tracks the program execution and extracts the execution trace of the CPU
application being monitored. This extracted execution trace is then compressed into sig-
natures and forms the input to the data race detection algorithm. GUARD's Signature
Generator , described in the next section, is build on top of such previously proposed
instruction-grain program monitors.
3
GPU Accelerated Data Race Detection
A snapshot of the basic GUARD mechanism is shown in Figure 2. The heterogeneous
architecture we model consists of CPU and GPU cores connected to each other and their
respective L2 caches through a common on-chip interconnection network (ICNT). Solid
lines with double arrows indicate data communication paths between the cores and
the caches through the interconnection network. Dotted lines indicate the flow of data
race detection related information in GUARD. Features of the heterogeneous multicore
processor modeled here are discussed in Section 4.
In GUARD, the memory access trace generation is orchestrated by a dedicated hard-
ware component we refer to as the Signature Generator (SG). An extraction logic in-
side the SG extracts the memory access trace of the application executing on the CPU
core. However, the volume of the trace generated at runtime makes it intractable to be
processed in real-time, even with GPU. To reduce the storage, communication, and
computation costs associated with managing these traces, they must be compressed
before processing. SG utilizes Bloom Filter [20] to compress the extracted trace into
signatures.
 
Search WWH ::




Custom Search