Accelerating Data Race Detection Utilizing On-Chip Data-Parallel Cores - Runtime Verification - page 213

Information Technology Reference

In-Depth Information

Ta b l e 2 . Number of data race conditions detected by GUARD

Parsec

Races Splash

−

2 Races

blackscholes

1

barnes

2

bodytrack

0

cholesky

2

canneal

1

fft

4

fluidanimate

4

lu

2

freqmine

0

ocean

0

streamcluster

7

radiosity

2

swaptions

1

raytrace

1

waterNS

0

5.1

Performance-Accuracy Trade-Offs

Although massively parallel, signature comparison based data race detection involves

significant amount of computational work. If not properly managed, it could slow down

the data race detection process and, in turn, stall the CPU application. Here, we analyze

the performance cost of GUARD and the performance-accuracy trade-offs we could

make. In particular, we look at two main parameters of GUARD, signature size and

throttling :

- We consider three signature sizes in our experiments: 2048-bits, 1024-bits, and

512-bits. The maximum size of an epoch is limited to 2000 instructions. The false

positive rate increases with decreasing signature size as discussed in Section 3.1.

- We consider three levels of parallelization ( throttling ) as discussed in Section 3.2:

full, half, and quart .Foran n -core CPU, the number of GPU threads required for

GUARD throttling at T grows at the rate of

( n 2 * T ).

O

Figure 4 presents the performance-accuracy trade-off characteristics of GUARD for

a 4-core CPU. The performance overhead (in bars) is evaluated as the slowdown (% in-

crease in cycles per instruction) of the CPU application being monitored with GUARD,

over its native execution. The values shown are average (geometric mean) of all the 15

benchmarks we evaluated. The accuracy (in lines) is evaluated as the false positive rate

(% of data races reported that are false) for the signature size used in GUARD. In this

section, we consider false positive rate without any filtering mechanisms (w/o Filter).

We discuss the filtering mechanism (w/ Filter) later in this section.

We observe that the difference in throttle level is well pronounced in the results. For

any particular signature size, full throttle performs better than half throttle which in turn

performs better than quart throttle. This is expected as the data race detection algo-

rithm is extremely parallel and with more GPU threads assigned, better performance is

obtained. Similarly, for any particular throttling, the performance of GUARD improves

with decreasing signature size as GPU kernel has less signature comparisons to perform.

However, this performance improvement is accompanied by increase in the false posi-

tive rate. We observe that at full throttle, we are able to achieve near-zero performance

overhead for data race detection on a 4-core CPU. Furthermore, by scaling the number

of GPU cores employed for data race detection, GUARD is able to perform data race

Next Page

Runtime Verification

Search WWH ::

Custom Search

Home