Accelerating Data Race Detection Utilizing On-Chip Data-Parallel Cores - Runtime Verification

Information Technology Reference

In-Depth Information

Concurrency bug detection tools for applications executing on GPU architectures

have been proposed [14-16]. GUARD differs from these software-based mechanisms

as it targets data race detection for CPU application, by utilizing on-chip GPU cores. A

recently proposed work, KUDA [17], proposes to utilize GPU threads to improve the

performance of data race detection on CPU threads. GUARD, however, differs from

KUDA in several aspects. KUDA needs binary instrumentation and the help of addi-

tional CPU threads (worker threads) for the extraction of memory access trace. Ad-

ditionally, the memory trace compression technique employed by GUARD helps in

outperforming KUDA.

2.3

Instruction-Grain Program Monitoring

Instruction-grain program monitors have been proposed to efficiently extract runtime

information from the CPU. These tools monitor programs at an instruction-level gran-

ularity and collect information such as program counter, instruction type, input/output

operands, and access addresses. Such monitors have been used for specialized pur-

poses such as memory checking, security tracking, and taint analysis [4,6,18]. Runtime

data race detection requires extraction of memory access information from the CPU

cores while the parallel applications are executing. General purpose instruction-grain

program monitors such as Log-based Architecture [19] can efficiently extract runtime

information from the CPU without significant hardware modifications. Previously, we

have proposed utilizing hardware support for extracting runtime information for dy-

namic program execution monitoring [18]. GUARD utilizes a similar hardware extrac-

tion logic that tracks the program execution and extracts the execution trace of the CPU

application being monitored. This extracted execution trace is then compressed into sig-

natures and forms the input to the data race detection algorithm. GUARD's Signature

Generator , described in the next section, is build on top of such previously proposed

instruction-grain program monitors.

3

GPU Accelerated Data Race Detection

A snapshot of the basic GUARD mechanism is shown in Figure 2. The heterogeneous

architecture we model consists of CPU and GPU cores connected to each other and their

respective L2 caches through a common on-chip interconnection network (ICNT). Solid

lines with double arrows indicate data communication paths between the cores and

the caches through the interconnection network. Dotted lines indicate the flow of data

race detection related information in GUARD. Features of the heterogeneous multicore

processor modeled here are discussed in Section 4.

In GUARD, the memory access trace generation is orchestrated by a dedicated hard-

ware component we refer to as the Signature Generator (SG). An extraction logic in-

side the SG extracts the memory access trace of the application executing on the CPU

core. However, the volume of the trace generated at runtime makes it intractable to be

processed in real-time, even with GPU. To reduce the storage, communication, and

computation costs associated with managing these traces, they must be compressed

before processing. SG utilizes Bloom Filter [20] to compress the extracted trace into

signatures.

Runtime Verification

Search WWH ::

Custom Search

Home