Database Reference
In-Depth Information
2.4.4 High-Level I/O Libraries
Files are usually considered as a linear sequence of bytes by most of the
file systems. Applications are responsible for interpreting the bytes into log-
ical structures, for instance a two-dimensional array of floating-point num-
bers. Without metadata to describe the logical data structures, a program
has di culty telling what the bytes represent. Therefore, in order to ensure
portability, a file's metadata must accompany the file at all times. This re-
quirement is particularly important for scientific data because many scientific
data libraries, such as for visualization and data mining, manipulate data at
a higher level than byte streams.
This section describes two popular scientific data libraries, parallel netCDF
and HDF5. Both libraries store metadata along with data in the same files.
In addition, both define their own file formats and a set of APIs to access the
files, sequentially as well as in parallel.
2.4.4.1
Parallel netCDF
The network common data form (netCDF) was developed at the Unidata
Program Center. 17 , 18 The goal of netCDF is to define a portable file format
so that scientists can share data across different machine platforms. Atmo-
spheric science applications, for example, use netCDF to store a variety of
data types that encompass single-point observations, time series, regularly
spaced grids, and satellite or radar images. 19 Many organizations, including
much of the climate community, rely on the netCDF data access standard
for data storage. 20 However, netCDF does not provide adequate parallel I/O
methods. For parallel write to a shared netCDF file, applications must serial-
ize access by passing all the data to a single process that then writes to the
file. The serial I/O access is both slow and cumbersome for the application
programmer. A new set of parallel programming interfaces for netCDF files,
parallel netCDF (PnetCDF), therefore has been developed. 13
The netCDF file format follows the common data form language (CDL)
suitable for interpreting data for human readers. It divides a netCDF file into
two parts: file header and body. The header contains all information about
dimensions, attributes, and scalar variables, followed by the body part con-
taining arrays of variable values in binary form. The netCDF file header first
defines a number of dimensions, each with a name and a length, which can
be used to describe the shapes of arrays. The most significant dimension of
a multidimensional array can be unlimited for arrays of growing size. Global
attributes not associated with any particular array can also be added to the
head. This feature allows programmers' annotation and other related informa-
tion to be added to increase the file's readability. The body part of a netCDF
file first stores the fixed-size arrays followed by the variable-sized arrays. For
storing a variable-sized array, netCDF defines each subarray comprising all
the fixed dimensions as a record, and the records are stored interleaved. All
offsets of fixed-size and variable-size arrays are properly saved in the header.
Search WWH ::




Custom Search