Scientific Data Management Challenges in High-Performance Visual Data Analysis - Scientific Data Management

Database Reference

In-Depth Information

When run in a parallel environment, visualization tools are expected to

adapt to available resources (e.g., number of processors) and partition

the data for processing in a way that achieves good load balance.

In the following subsections, we describe how these visualization tools use

the data from scientific simulations. In particular, we will discuss

How a production-level, parallel visualization tool loads data, processes

it, and produces results

How a production-level, parallel visualization tool can optimize its data

management and processing with the presence of metadata

The importance of data semantics from the perspective of a production-

level, parallel visualization tool

9.2.1 How Data Is Processed

The three major parallelized, production-level visualization tools—EnSight, 3

VisIt, 4 and ParaView 5 —all employ similar strategies. They use a client-server

design, where the client provides a user interface on the user's desktop, and

the server runs where the data is located, which is assumed to have resources

for parallel processing. The general data management strategy for the parallel

server can essentially be described as a scatter-gather algorithm. The process

can be characterized in three steps:

1. I/O (scatter): load data (in parallel) onto the server

2. Processing: employ visualization and analysis algorithms; transform the

data to geometry

3. Rendering (gather): transform the geometry into images

9.2.1.1

I/O

Since visualization is a data-intensive endeavor, I/O is frequently the slowest

and most expensive part of the entire visualization pipeline. As such, it is ad-

vantageous to parallelize the data loading. A typical design pattern is for each

processor of the parallel server to read a portion of the input dataset, which

is the mechanism that “scatters” the dataset across each of the processors.

The key question during the I/O phase is how to assign portions of the input

dataset to the processors of the server. We simplify the discussion below, by

assuming that the data is being read from disk, that is, not being processed

in situ as part of a single program with the simulation code.

When the visualization server processes a portion of the dataset, the input

dataset must be partitioned and distributed across the server's processors.

When the simulation outputs data, it may impose restrictions on data parti-

tioning. There are two typical scenarios:

Search WWH ::

Custom Search

Home