Information Technology Reference
In-Depth Information
64−bit Address
WRx −> RDy
WRx −> WRy
RDx −> WRy
63
32
12
0
SIGx0
SIGy0
SIGx1
SIGy1
h1
h2
h5
h6
h7
h8
h3
h4
2047
0
SIGx15
SIGy15
SIG
2048−bit Signature
SIG
SIG
SIG n−1
x y
(b) Structure of the signature table for an n -
core CPU. Curved arrows indicate the three
required comparisons in H-B algorithm be-
tween every concurrent signatures of the
CPUs X and Y.
0
(a) Signature formation: A 2048-bit signature is
divided into 8 bins and 8 different hash functions
are applied on the 64-bit address to set the signa-
ture bits.
Fig. 3. Signature formation process and the structure of the signature table
Signature Table Size: The difference in frequency of instructions in different cores
could mean that an epoch in one core needs to be H-B compared with a significantly
large number of concurrent epochs from another core. This, in turn, means that the H-B
scheme would need a significantly large number of entry slots in the signature table to
perform ideal data race detection without missing any concurrent epochs. However, in
practical systems such as GUARD, we cannot afford such a large signature table. In
addition to the larger signature table size, this will also lead to a greater performance
penalty, as the data race detector will have to perform a significantly higher number of
H-B signature comparisons. Limiting the number of entries in the signature table, on
the other hand, inevitably leads to missing the comparison of some concurrent epochs;
a parameter we refer to as Missed Epoch Comparisons .
In our experiments, we evaluate the missed epoch comparisons for a 16-entry and a
64-entry signature table compared with an ideal signature table with an infinite number
of entries. We observe that the 16-entry signature table misses 3.16% of epoch compar-
isons, while the 64-entry signature table misses 0.12% of epoch comparisons, versus the
ideal signature table. Since the 64-entry signature table incurs a significantly higher per-
formance overhead compared to the 16-entry signature table, for a small improvement
in missed epoch comparisons, we chose to evaluate GUARD with a 16-entry signature
table.
Size of the signature table grows linearly with the number of CPU cores monitored
and the number of signature entries. Even for a small core count and number of sig-
nature entries, this is high overhead to be constructed as a dedicated on-chip hardware
structure. For example, a four-core CPU with 2048-bit signatures and 16 signature en-
tries has a signature table size of 32 kilo bytes [5]. GUARD stores the signature table in
the GPU last-level cache (LLC), without any additional hardware overhead. GUARD
shares the LLC space with other GPU applications and hence the space is reusable.
Designs that store the data race detection related information as an extension of the
cache line, such as HARD [6], suffer from lost detection opportunities when the lines
are evicted. GUARD does not suffer from this limitation as the signatures are not based
on information in cache line extension.
 
Search WWH ::




Custom Search