Information Technology Reference
In-Depth Information
Ta b l e 2 .
Number of data race conditions detected by GUARD
Parsec
Races Splash
−
2 Races
blackscholes
1
barnes
2
bodytrack
0
cholesky
2
canneal
1
fft
4
fluidanimate
4
lu
2
freqmine
0
ocean
0
streamcluster
7
radiosity
2
swaptions
1
raytrace
1
waterNS
0
5.1
Performance-Accuracy Trade-Offs
Although massively parallel, signature comparison based data race detection involves
significant amount of computational work. If not properly managed, it could slow down
the data race detection process and, in turn, stall the CPU application. Here, we analyze
the performance cost of GUARD and the performance-accuracy trade-offs we could
make. In particular, we look at two main parameters of GUARD,
signature size
and
throttling
:
-
We consider three signature sizes in our experiments: 2048-bits, 1024-bits, and
512-bits. The maximum size of an epoch is limited to 2000 instructions. The false
positive rate increases with decreasing signature size as discussed in Section 3.1.
-
We consider three levels of parallelization (
throttling
) as discussed in Section 3.2:
full, half,
and
quart
.Foran
n
-core CPU, the number of GPU threads required for
GUARD throttling at
T
grows at the rate of
(
n
2
*
T
).
O
Figure 4 presents the performance-accuracy trade-off characteristics of GUARD for
a 4-core CPU. The performance overhead (in bars) is evaluated as the slowdown (% in-
crease in cycles per instruction) of the CPU application being monitored with GUARD,
over its native execution. The values shown are average (geometric mean) of all the 15
benchmarks we evaluated. The accuracy (in lines) is evaluated as the false positive rate
(% of data races reported that are false) for the signature size used in GUARD. In this
section, we consider false positive rate without any filtering mechanisms (w/o Filter).
We discuss the filtering mechanism (w/ Filter) later in this section.
We observe that the difference in throttle level is well pronounced in the results. For
any particular signature size,
full
throttle performs better than
half
throttle which in turn
performs better than
quart
throttle. This is expected as the data race detection algo-
rithm is extremely parallel and with more GPU threads assigned, better performance is
obtained. Similarly, for any particular throttling, the performance of GUARD improves
with decreasing signature size as GPU kernel has less signature comparisons to perform.
However, this performance improvement is accompanied by increase in the false posi-
tive rate. We observe that at
full
throttle, we are able to achieve near-zero performance
overhead for data race detection on a 4-core CPU. Furthermore, by scaling the number
of GPU cores employed for data race detection, GUARD is able to perform data race