Image Processing Reference
In-Depth Information
26.3 Parallelization Strategies
In all algorithm discussion below, we consider vector field data as decomposed
into a number of spatially disjoint blocks. In the case of time-varying vector fields,
the blocks are assumed to be four-dimensional, i.e. encompass a time interval in
addition to a spatial subset. There are two straightforward parallelization approaches
that partition either the computation workload, with seed points as the elementary
unit of work, or the data set, where data blocks are distributed.
26.3.1 Parallelization Over Seeds
The parallelize-over-seeds (POS) algorithm parallelizes over the set of seed points,
assigning to each processor a fixed number of seed points from which integral curves
are propagated. Data blocks are loaded on demand when required, i.e. when inte-
gration must be continued in a block that is not present in memory. Communication
is not required in this scheme except to synchronize processors at initialization and
termination. Clearly, the dominant cost of this scheme is I/O and can be expressed
as the number of blocks loaded. To alleviate overall I/O cost, blocks can be kept in
processor memory to be reused (caching).
Pugmire et al.[
16
] provided a detailed analysis of the performance of POS over
a range of streamline problems. They make aggressive use of caching, new blocks
are only loaded when no streamline can be continued on the current resident blocks.
Cache eviction is performed according to least recently used order.
The initial assignment of seed points to processors is chosen based on spatial
proximity, following the reasoning that the integral curves traced from spatially close
seed points are likely to traverse the same regions, and thus blocks, of a dataset. This
approach is effective in increasing block reuse, especially for dense seed sets. They
also sketch out worst case scenarios. For example, if streamlines travel nearly in
cycles, and if the number of blocks encountered along these cycles exceeds the size
of the cache, performance drops significantly.
In general, caching can only be effective if block reuse is high; this is often the case
for stationary fields, but less frequent for time-varying fields since curves advance
monotonically along the time axis and thus traverse previously not encountered
spatiotemporal blocks frequently.
A different approach to address I/O cost was described by Wolter et al. [
21
]
by introducing intelligent prefetching to overlap I/O and computation. By building
a Markov model of block transitions, they are able to predict likely future blocks
which can then be loaded ahead of time. Two strategies are proposed for initializing
the Markov model, both requiring pre-computation to build a transition probability
model. They find much increased performance for the stationary case, but point out
limitations for time-varying problems that stem from a construction of the prediction
model on individual time slices.
Search WWH ::
Custom Search