Introduction - Python and HDF5

Databases Reference

In-Depth Information

>>> out = np . load ( "weather.npz" )

>>> out [ "data" ]

array([ 0.44149738, 0.7407523 , 0.44243584, ..., 0.19018119,

0.64844851, 0.55660748])

>>> out [ "start_time" ]

array(1375204299)

>>> out [ "station" ]

array(15)

So far so good. But what if we have more than one quantity per station? Say there's also

wind speed data to record?

>>> wind = np . random . random ( 2048 )

>>> dt_wind = 5.0 # Wind sampled every 5 seconds

And suppose we have multiple stations. We could introduce some kind of naming con‐

vention, I suppose: “wind_15” for the wind values from station 15, and things like

“dt_wind_15” for the sampling interval. Or we could use multiple files…

In contrast, here's how this application might approach storage with HDF5:

>>> import h5py

>>> f = h5py . File ( "weather.hdf5" )

>>> f [ "/15/temperature" ] = temperature

>>> f [ "/15/temperature" ] . attrs [ "dt" ] = 10.0

>>> f [ "/15/temperature" ] . attrs [ "start_time" ] = 1375204299

>>> f [ "/15/wind" ] = wind

>>> f [ "/15/wind" ] . attrs [ "dt" ] = 5.0

---

>>> f [ "/20/temperature" ] = temperature_from_station_20

---

(and so on)

This example illustrates two of the “killer features” of HDF5: organization in hierarchical

groups and attributes. Groups, like folders in a filesystem, let you store related datasets

together. In this case, temperature and wind measurements from the same weather

station are stored together under groups named “/15,” “/20,” etc. Attributes let you attach

descriptive metadata directly to the data it describes . So if you give this file to a colleague,

she can easily discover the information needed to make sense of the data:

>>> dataset = f [ "/15/temperature" ]

>>> for key , value in dataset . attrs . iteritems ():

... print " %s : %s " % ( key , value )

dt: 10.0

start_time: 1375204299

Coping with Large Data Volumes

As a high-level “glue” language, Python is increasingly being used for rapid visualization

of big datasets and to coordinate large-scale computations that run in compiled lan‐

Search WWH ::

Custom Search

Home