Hardware Reference
In-Depth Information
Core0
Core1
Core2
Core3
Core4
Core5
Core6
Core7
CPU0
CPU1
CPU2
CPU3
CPU4
CPU5
CPU6
CPU7
Cluster0
Cluster1
BARW for each core
BARR for each core
Fig. 4.15
Barrier registers for synchronization
Core 0
Core 1
Core 2
Core 3
Core 4
Core 5
Core 6
Core 7
Barrier Initialization (Each core clear its BARW to zero)
Executions (Each core runs and sets its BARW to one at specific point)
Barrier Synchronization (Each core waits its BARR to be all ones)
Executions (Each core runs and clears its BARW to zero at specific point)
Barrier Synchronization (Each core waits its BARR to be all zeros)
Fig. 4.16
Synchronization example using barrier registers
Table 4.5
Eight-core synchronization cycles
Conventional method
(via external memory)
RP-2 method
(via BARW/BARR registers)
Average clock cycles
52,396
8,510
Average difference
20,120
10
when it reaches a specific point, and it checks and waits until all its BARR values
are ones reflecting the BARW values. The synchronization is complete when all the
BARW values are inverted to ones. The next synchronization can start immediately
with the BARWs being ones and is complete when all the BARW values are inverted
to zeros.
Table 4.5 compares the results of eight-core synchronizations with and without
the barrier registers. The average number of clock cycles required for a certain task
to be completed with and without barrier registers is 8,510 and 52,396 cycles,
respectively. The average differences in the synchronizing cycles between the first
and last cores are 10 and 20,120 cycles with and without the barrier registers, respec-
tively. These results show that the barrier registers effectively improve the
synchronization.
Search WWH ::




Custom Search