Databases Reference
In-Depth Information
Anyone who has spent hours fiddling with byte-offsets while trying to read “simple”
binary formats can appreciate this.
Finally, the low-level byte layout of an HDF5 file on disk is an open specification. There
are no mysteries about how it works, in contrast to proprietary binary formats. And
although people typically use the library provided by the HDF Group to access files,
nothing prevents you from writing your own reader if you want.
HDF5: The Library
The HDF5 file specification and open source library is maintained by the HDF Group ,
a nonprofit organization headquartered in Champaign, Illinois. Formerly part of the
University of Illinois Urbana-Champaign, the HDF Group's primary product is the
HDF5 software library.
Written in C, with additional bindings for C++ and Java, this library is what people
usually mean when they say “HDF5.” Both of the most popular Python interfaces,
PyTables and h5py, are designed to use the C library provided by the HDF Group.
One important point to make is that this library is actively maintained, and the devel‐
opers place a strong emphasis on backwards compatibility. This applies to both the files
the library produces and also to programs that use the API. File compatibility is a must
for an archival format like HDF5. Such careful attention to API compatibility is the
main reason that packages like h5py and PyTables have been able to get traction with
many different versions of HDF5 installed in the wild.
You should have confidence when using HDF5 for scientific data storage, including
long-term storage. And since both the library and format are open source, your files
will be readable even if a meteor takes out Illinois.
HDF5: The Ecosystem
Finally, one aspect that makes HDF5 particularly useful is that you can read and write
files from just about every platform. The IDL language has supported HDF5 for years;
MATLAB has similar support and now even uses HDF5 as the default format for its
“.mat” save files. Bindings are also available for Python, C++, Java, .NET, and LabView,
among others. Institutional users include NASA's Earth Observing System, whose
“EOS5” format is an application format on top of the HDF5 container, as in the much
simpler example earlier. Even the newest version of the competing NetCDF format,
NetCDF4, is implemented using HDF5 groups, datasets, and attributes.
Hopefully I've been able to share with you some of the things that make HDF5 so exciting
for scientific use. Next, we'll review the basics of how HDF5 works and get started on
using it from Python.
Search WWH ::




Custom Search