Database Reference
In-Depth Information
the performance is not always as good as writing one file per processor, we
have shown that writing files with Parallel HDF5 is consistently faster than
writing the data in raw/native binary using the MPI-IO library. 29 This e-
ciency is made possible through sophisticated HDF5 tuning directives, which
are transparent to the parallel application, that control data alignment and
caching within the HDF5 layer. Therefore, we argue that it would be dicult
to match HDF5 performance even using a home-grown binary file format.
9.4.2 HDF5 FastQuery
Large-scale scientific data is often stored in scientific data formats like FITS,
netCDF, and HDF. These storage formats are of particular interest to the
scientific user community since they provide multidimensional storage and
retrieval capabilities. However, one of the drawbacks of these storage formats
is that they do not support the ability to extract subsets of data that meet
multidimensional, compound range conditions. Such multidimensional range
conditions are often the basis for defining “features of interest,” which are the
focus of scientific inquiry and study.
HDF5 FastQuery 30 is a high-level API that provides the ability to perform
multidimensional indexing and searching on large HDF5 files. It leverages an
ecient bitmap indexing technology called FastBit 31 - 33 (described in Chapter
6) that has been widely used in the database community. Bitmap indexes are
especially well suited for interactive exploration of large-scale read-only data.
Storing the bitmap indexes into the HDF5 file has the following advantages:
(1) significant performance speed-up of accessing subsets of multidimensional
data and (2) portability of the indexes across multiple computer platforms.
The HDF5 FastQuery API simplifies the execution of queries on HDF5 files for
general scientific applications and data analysis. The design is flexible enough
to accommodate the use of arbitrary indexing technology for semantic range
queries.
HDF5 FastQuery provides an interface to support semantic indexing for
HDF5 via a query API. HDF5 FastQuery allows users to eciently gener-
ate complex selections on HDF5 datasets using compound range queries like
(
10 5
and retrieve only the subset
of data elements that meet the query conditions. The FastBit technology
generates the compressed bitmap indexes that accelerate searches on HDF5
datasets, as well as the raw indexes (the compressed bitmap indexes), which
are stored together with the datasets in an HDF5 file. Compared with other
indexing schemes, compressed bitmap indices are compact and very well suited
for searching over multidimensional data even for arbitrarily complex combi-
nations of range conditions.
energy
>
)
AND
(
70
<
pressure
<
90
)
9.4.2.1
Functionality
HDF5 supports slab and hyperslab selections of n -dimensional datasets.
HDF5 FastQuery extends the HDF5 selection mechanism to allow subset
Search WWH ::




Custom Search