Database Reference
In-Depth Information
and body parts. There are four essential classes of information in a header:
name, data type, dataspace, and storage layout. The name of the array is a
text string. The data type describes the numerical type of array elements,
which can be atomic, native, compound, or named. Atomic data types are
the primitive data types, such as integers and floats. Native data types are
system-specific instances of atomic data types. Compound data types are col-
lections of atomic data types. Named data types are either atomic or com-
pound data types that can be shared across arrays. A dataspace depicts the
dimensionality of an array. Unlike netCDF, all dimensions of an HDF5 ar-
ray can be either fixed or unlimited. The storage layout specifies the way
a multidimensional array is stored in a file. The default storage layout for-
mat is contiguous, meaning that data is stored in the same linear way that
it is organized in memory. The other storage layout is called chunked, in
which an array is divided into equal-sized chunks, and chunks are stored sep-
arately in the file. Chunking has three important benefits. First, it provides
the possibility to achieve good performance when accessing noncontiguous
subsets of the arrays. Secondly, it allows large array compression. Third, it
enables the dimension extension of an array in any direction. The chunk-
ing is also applicable to headers. Therefore, the HDF5 headers can be dis-
persed in separate header blocks for each object, not limited to the beginning
of the file. Another important feature of HDF5 is called grouping . A collec-
tion of arrays can be grouped together, and a group may contain a number
of arrays and other groups that are organized in a tree-based hierarchical
structure.
HDF5 APIs are divided into 12 categories: general purpose, attributes,
datasets, error handling, file access, grouping, object identifiers, property list,
references, data-space, data type, and filter/compression. Writing an HDF5
file comprises the following steps: create a file, create groups, define datas-
paces, define data types, create arrays, write attributes, write array data, and
close the file. The parallel data access support in HDF5 is built on top of
MPI-IO to ensure the file's portability. However, HDF5 does not separate
its file access routines into collective and independent versions as MPI-IO
does. Parallel I/O is enabled through the setting of properties passed to the
file open and data access APIs. The properties tell HDF5 to perform I/O
collectively or independently. Similar to PnetCDF, HDF5 allows accessing
a subarray in a single I/O call, and it is achieved through defining hyper-
slabs in the dataspace. HDF5's chunking allows writing subarrays without
reorganizing them into a global canonical order in the file. The advantage of
chunking becomes significant when the storage layout is orthogonal to the
access layout, for example, storage layout of a two-dimensional array being in
the row major and the access pattern being in the column major. However,
this high degree of flexibility in HDF5 can sometimes come at the cost of high
performance. 13 , 27
Search WWH ::




Custom Search