Graphics Reference
In-Depth Information
have small work-groups. As the work-group divergence that is a consequence
of small work-groups will hurt cache sharing between work-groups, we have two
competing behaviors, and we need to find a good compromise. We may therefore
want to select a small work-group size to ensure that barriers are cheap (and an
NT algorithm, as we say that it goes better with small work-groups). On the
other hand, we will want to choose a large work-group size, as we want to keep
work-group divergence small, and therefore want to have only a small number of
active work-groups.
We also have to find a good compromise for the value of the parameter
Δ K cache . It must be small enough to keep thread divergence low, but we also
need to ensure that we do not execute the barrier too often (to keep the cost of
executing the barriers low).
The barrier is executed N/ K reg . Δ K cache ) times, with a cost of 2 λ 0 λ 1 cycles
each time, and we want to ensure that this cost is small compared to the number
of memory operations. The number of memory operations performed by the work-
group between two consecutive barriers is Δ K cache I Δ K reg . K reg . Δ J ) λ 0 λ 1 ,
and our condition for Δ K cache becomes
Δ K cache I Δ K reg . K reg . Δ J ) λ 0 λ 1 > 2 λ 0 λ 1 ,
which we rewrite as
2 λ 0 λ 1
I Δ K reg . K reg . Δ J ) .
Δ K cache >
For the 2
×
4
×
2 blocking with a large work-group size, we find that we need
Δ K cache > 2[#active threads]
8+8
= 256
16
=16 .
For our kernel, we have chosen Δ K cache = 32.
7.5.9 Page Table Lookups
One aspect that we have not yet discussed but that is important for the perfor-
mance of the NN and NT versions, is the way they affect the memory management
unit. The physical memory is partitioned into areas called pages , and the abstract
pointer space is also partitioned into pages in a similar way. Every page that we
use in the pointer space must correspond to a page in physical memory, and in-
formation about this mapping is stored in page tables. When we try to access
memory using a pointer, we need to find out which page it points to in pointer
space, and which page that corresponds to in the physical memory space. The
process of looking up a page in the page tables is costly, and we want to perform
as few page table lookups as possible when executing our kernel. 23
Going back
23 This description is simplified but sucient for our current purposes.
Search WWH ::




Custom Search