Large-Scale Integration-Based Vector Field Visualization - Scientific Visualization: Uncertainty, Multifield, Biomedical, and Scalable Visualization

Image Processing Reference

In-Depth Information

26.3 Parallelization Strategies

In all algorithm discussion below, we consider vector field data as decomposed

into a number of spatially disjoint blocks. In the case of time-varying vector fields,

the blocks are assumed to be four-dimensional, i.e. encompass a time interval in

addition to a spatial subset. There are two straightforward parallelization approaches

that partition either the computation workload, with seed points as the elementary

unit of work, or the data set, where data blocks are distributed.

26.3.1 Parallelization Over Seeds

The parallelize-over-seeds (POS) algorithm parallelizes over the set of seed points,

assigning to each processor a fixed number of seed points from which integral curves

are propagated. Data blocks are loaded on demand when required, i.e. when inte-

gration must be continued in a block that is not present in memory. Communication

is not required in this scheme except to synchronize processors at initialization and

termination. Clearly, the dominant cost of this scheme is I/O and can be expressed

as the number of blocks loaded. To alleviate overall I/O cost, blocks can be kept in

processor memory to be reused (caching).

Pugmire et al.[ 16 ] provided a detailed analysis of the performance of POS over

a range of streamline problems. They make aggressive use of caching, new blocks

are only loaded when no streamline can be continued on the current resident blocks.

Cache eviction is performed according to least recently used order.

The initial assignment of seed points to processors is chosen based on spatial

proximity, following the reasoning that the integral curves traced from spatially close

seed points are likely to traverse the same regions, and thus blocks, of a dataset. This

approach is effective in increasing block reuse, especially for dense seed sets. They

also sketch out worst case scenarios. For example, if streamlines travel nearly in

cycles, and if the number of blocks encountered along these cycles exceeds the size

of the cache, performance drops significantly.

In general, caching can only be effective if block reuse is high; this is often the case

for stationary fields, but less frequent for time-varying fields since curves advance

monotonically along the time axis and thus traverse previously not encountered

spatiotemporal blocks frequently.

A different approach to address I/O cost was described by Wolter et al. [ 21 ]

by introducing intelligent prefetching to overlap I/O and computation. By building

a Markov model of block transitions, they are able to predict likely future blocks

which can then be loaded ahead of time. Two strategies are proposed for initializing

the Markov model, both requiring pre-computation to build a transition probability

model. They find much increased performance for the stationary case, but point out

limitations for time-varying problems that stem from a construction of the prediction

model on individual time slices.

Search WWH ::

Custom Search

Home