Databases Reference
In-Depth Information
There's a lot going on here, but it's pretty straightforward. The root group has attributes
for a timestamp (when the file was written), along with a version number for the “format”
used to structure the file using groups, datasets, and attributes.
Then each particle that goes down the beamline has its own group. The group attributes
record the analyzed mass and velocity, along with an integer that uniquely identifies the
event. Finally, the three waveforms with our original data are recorded in the particle
group. They also have an attribute, in this case giving the sampling interval of the time
series.
Analyzing the Data
The crucial thing here is that the metadata required to make sense of the raw waveforms
is stored right next to the data . For example, time series like our waveforms are useless
unless you also know the time spacing of the samples. In the preceding file, that time
interval ( dt ) is stored as an attribute on the waveform dataset. If we wanted to plot a
waveform with the correct time scaling, all we have to do is:
import pyplot as p
f = h5py . File ( "November_Run3.hdf5" , 'r' )
# Retrieve HDF5 dataset
first_detector = f [ '/0/first_detector' ]
# Make a properly scaled time axis
x_axis = np . arange ( len ( first_detector )) * first_detector . attrs [ 'dt' ]
# Plot the result
p . plot ( x_axis , first_detector [ ... ])
There's another great way HDF5 can simplify your analysis. With other formats, it's
common to have an input file or files, a code that processes them, and a “results” file
with the output of your computation. With HDF5, you can have one file containing both
the input data and the results of your analysis.
For example, suppose we wrote a piece of code that determined the electrical charge on
the particle from the waveform data. We can store this right in the file next to the
estimates for mass and velocity:
from some_science_package import charge_estimator
def update_particle_group ( grp ):
# Retrieve waveform data
first_det = grp [ 'first_detector' ][ ... ]
second_det = grp [ 'second_detector' ][ ... ]
# Retrieve time scaling data
dt = grp [ 'first_detector' ] . attrs [ 'dt' ]
Search WWH ::




Custom Search