Databases Reference
In-Depth Information
It's quite different from SQL-style relational databases. HDF5 has quite a few organi‐
zational tricks up its sleeve (see Chapter 8 , for example), but if you find yourself needing
to enforce relationships between values in various tables, or wanting to perform JOINs
on your data, a relational database is probably more appropriate. Likewise, for tiny 1D
datasets you need to be able to read on machines without HDF5 installed. Text formats
like CSV (with all their warts) are a reasonable alternative.
HDF5 is just about perfect if you make minimal use of relational features and have a
need for very high performance, partial I/O, hierarchical organization, and arbitrary
metadata.
So what, specifically, is “HDF5”? I would argue it consists of three things:
1. A file specification and associated data model.
2. A standard library with API access available from C, C++, Java, Python, and others.
3. A software ecosystem, consisting of both client programs using HDF5 and “analysis
platforms” like MATLAB, IDL, and Python.
HDF5: The File
In the preceding brief examples, you saw the three main elements of the HDF5 data
model: datasets , array-like objects that store your numerical data on disk; groups , hier‐
archical containers that store datasets and other groups; and attributes , user-defined
bits of metadata that can be attached to datasets (and groups!).
Using these basic abstractions, users can build specific “application formats” that orga‐
nize data in a method appropriate for the problem domain. For example, our “weather
station” code used one group for each station, and separate datasets for each measured
parameter, with attributes to hold additional information about what the datasets mean.
It's very common for laboratories or other organizations to agree on such a “format-
within-a-format” that specifies what arrangement of groups, datasets, and attributes are
to be used to store information.
Since HDF5 takes care of all cross-platform issues like endianness, sharing data with
other groups becomes a simple matter of manipulating groups, datasets, and attributes
to get the desired result. And because the files are self-describing , even knowing about
the application format isn't usually necessary to get data out of the file. You can simply
open the file and explore its contents:
>>> f . keys ()
[u'15', u'big', u'comp']
>>> f [ "/15" ] . keys ()
[u'temperature', u'wind']
Search WWH ::




Custom Search