Database Reference
In-Depth Information
selection based upon arbitrary range conditions on the data values contained
in the datasets using the bitmap indexes. As a result, HDF5 FastQuery sup-
ports fast execution of searches based upon compound queries that span multi-
ple datasets. The API also allows us to seamlessly integrate the FastBit query
mechanism for data selection with HDF5's standard hyperslab selection mech-
anism. Using the HDF5 FastQuery API, one can quickly select subsets of data
from an HDF5 file using text-string queries.
The bitmap indexes are created and stored through a single call to the
HDF5 FastQuery API. The storage of these indexes uses separate arrays in
the same file as the datasets they refer to and are opaque to the general HDF5
functions. It is important to note that all such indexes must be built before
any queries are posed to the API. Once the bitmap indexes have been built
and stored in the data file, queries are posed to the API as a text string such
as
, where the names spec-
ified in the range query correspond to the names of the datasets in the HDF5
file. The HDF5 FastQuery interface uses the stored bitmap indexes that cor-
respond to the specified dataset to accelerate the selection of elements in the
datasets that meet the search criteria. An accelerated query on the contents
of a dataset requires only small portions of the compressed bitmap indexes to
be read into memory, so extremely large datasets can be searched with little
memory overhead. The query engine then generates an HDF5 selection that
is used to read only the elements from the dataset that are specified by the
query string.
(
temperature
>
1000
)
AND
(
70
<
pressure
<
90
)
9.4.2.2
Architectural Layout
In this section, we present a high-level view of the HDF5 FastQuery archi-
tectural layout. We begin by defining relevant terms used throughout the
architectural layout as well as the HDF5 FastQuery API.
Groups are the logical way to organize data in an HDF5 file. We use the
term group or grouping to refer to this logical structuring. These groups act
as containers of various types of metadata, which in our approach are specific
to a given dataset. Note that these groups may be assigned type information
(float, int, string, etc.) to uniquely describe these datasets.
Variables vs. Attributes. The properties assigned to a specific group (i.e.,
group metadata) are called attributes or group attributes. For all datasets,
the specific physical properties that the dataset quantizes (density, pressure,
helicity, etc.) will be referred to as dataset variables. To organize a given mul-
tivariate dataset consisting of a discrete range of time steps, a division is made
between the raw data and the attributes that describe the data. This division
is represented in the architectural layout by the separation and formation of
two classes of groups: the TimeStep groups for the raw data, and the Vari-
ableDescriptor groups for the metadata used to describe the dataset variables.
For the dataset variables, one VariableDescriptor group is created for each
variable (pressure, velocity, etc.). The metadata saved under these groups
Search WWH ::




Custom Search