Scientific Data Management Challenges in High-Performance Visual Data Analysis - Scientific Data Management

Database Reference

In-Depth Information

selection based upon arbitrary range conditions on the data values contained

in the datasets using the bitmap indexes. As a result, HDF5 FastQuery sup-

ports fast execution of searches based upon compound queries that span multi-

ple datasets. The API also allows us to seamlessly integrate the FastBit query

mechanism for data selection with HDF5's standard hyperslab selection mech-

anism. Using the HDF5 FastQuery API, one can quickly select subsets of data

from an HDF5 file using text-string queries.

The bitmap indexes are created and stored through a single call to the

HDF5 FastQuery API. The storage of these indexes uses separate arrays in

the same file as the datasets they refer to and are opaque to the general HDF5

functions. It is important to note that all such indexes must be built before

any queries are posed to the API. Once the bitmap indexes have been built

and stored in the data file, queries are posed to the API as a text string such

as

, where the names spec-

ified in the range query correspond to the names of the datasets in the HDF5

file. The HDF5 FastQuery interface uses the stored bitmap indexes that cor-

respond to the specified dataset to accelerate the selection of elements in the

datasets that meet the search criteria. An accelerated query on the contents

of a dataset requires only small portions of the compressed bitmap indexes to

be read into memory, so extremely large datasets can be searched with little

memory overhead. The query engine then generates an HDF5 selection that

is used to read only the elements from the dataset that are specified by the

query string.

(

temperature

>

1000

)

AND

(

70

<

pressure

<

90

)

9.4.2.2

Architectural Layout

In this section, we present a high-level view of the HDF5 FastQuery archi-

tectural layout. We begin by defining relevant terms used throughout the

architectural layout as well as the HDF5 FastQuery API.

Groups are the logical way to organize data in an HDF5 file. We use the

term group or grouping to refer to this logical structuring. These groups act

as containers of various types of metadata, which in our approach are specific

to a given dataset. Note that these groups may be assigned type information

(float, int, string, etc.) to uniquely describe these datasets.

Variables vs. Attributes. The properties assigned to a specific group (i.e.,

group metadata) are called attributes or group attributes. For all datasets,

the specific physical properties that the dataset quantizes (density, pressure,

helicity, etc.) will be referred to as dataset variables. To organize a given mul-

tivariate dataset consisting of a discrete range of time steps, a division is made

between the raw data and the attributes that describe the data. This division

is represented in the architectural layout by the separation and formation of

two classes of groups: the TimeStep groups for the raw data, and the Vari-

ableDescriptor groups for the metadata used to describe the dataset variables.

For the dataset variables, one VariableDescriptor group is created for each

variable (pressure, velocity, etc.). The metadata saved under these groups

Scientific Data Management

Search WWH ::

Custom Search

Home