Large-Scale Integration-Based Vector Field Visualization - Scientific Visualization: Uncertainty, Multifield, Biomedical, and Scalable Visualization

Image Processing Reference

In-Depth Information

In their findings, the communication cost incurred by block transitions plays a

secondary role to the overall number of blocks traversed. Essentially, in the case

of a dense seed set, streamlines in the same spatial regions must be computed by

comparatively few processors owning the required blocks, leading to a large load

imbalance. A similar observation holds for vector fields with regions of strong con-

vergence, which tends to concentrate trajectories in small regions of a data set. To

counteract this effect, they propose to assign blocks to processors in round-robin

fashion or randomly.

Nouanesengsy et al.[ 14 ] provided an approach to address the load imbalances in

POB by using workload-aware partitioning of the vector field data. Treating block

assignment as an optimization problem, they compute a flow graph from an abstract

representation of the vector field. In conjunction with a given seed set, the flow graph

provides an estimate of blocks that need to be traversed to compute curves, and the

resulting set of blocks is then partitioned to minimize communication cost. Addi-

tionally, to leverage resources optimally, block replication is allowed, addressing

starvation problems. In experiments, they observe better performance and scalabil-

ity with respect to a round-robin assignment; however, they incur a non-negligible

preprocessing cost per data set.

While it is straightforward to implement, the POB algorithm has one severe lim-

itation; if the combined memory on all processors cannot accommodate the entire

data set, this algorithm cannot be used. This is especially the case for time-varying

vector fields.

26.3.3 Adaptive Load Balancing

As apparent from the discussion above, POS and POB represent two extremes of a

spectrum of parallelization schemes, focusing exclusively on either work distribution

or data distribution. To address the respective shortcomings of either choice, Pugmire

et al. [ 16 ] describe an adaptive algorithm that performs dynamic distribution of both

integral curves and blocks, attempting to balance I/O and communication loads at

run-time based on heuristics.

Algorithmically, a master processor coordinates the workloads of a fixed set of

slave processors. The master initially determines a seed point assignment for the

slaves; as work progresses, it monitors the length of each slave's assigned integral

curve set and the blocks that are loaded. To avoid overload and starvation, it then

reassigns individual curves or entire blocks among slaves. Once the master deter-

mines that all curves have been computed, it instructs the slaves to terminate. This

adaptive approach entails significant complexity, which is expressed in complicated

heuristics. The authors also investigate the scalability of their approach, but find that

using a single master becomes a bottleneck as the number of processors grows. They

postulate that a hierarchical approach with a hierarchy of master processes should

perform better at scale. Furthermore, the load-balancing heuristics rely on several

machine-dependent parameters such as I/O and communication latency and speed

Search WWH ::

Custom Search

Home