Databases Reference
In-Depth Information
of items, each of which is orderable according to some scheme like a string name or
numeric identifier, and building up a tree-like “index” to rapidly retrieve an item.
For example, if you have an HDF5 group with a single member, and another group with
a million members, it doesn't take a million times as long to open an object in the latter
group. Group members are indexed by name, so if you know the name of an object then
HDF5 can traverse the index and quickly retrieve the item. The same is true when
creating a new group member; HDF5 doesn't have to “insert” the member into the
middle of a big table somewhere, shuffling all the entries around.
Of course, all of this is transparent to the user. Every group in an HDF5 file comes with
an index that tracks members in alphabetical order. Keep in mind this means “C-style”
alphabetical order (whimsically called “ASCIIbetical” order):
>>> f = h5py . File ( 'iterationdemo.hdf5' , 'w' )
>>> f . create_group ( '1' )
>>> f . create_group ( '2' )
>>> f . create_group ( '10' )
>>> f . create_dataset ( 'data' , ( 100 ,))
>>> f . keys ()
[u'1', u'10', u'2', u'data']
Files can also contain other optional indices, for example those that track object creation
time, but h5py doesn't expose them.
This brings us to the first point: h5py will generally iterate over objects in the file in
alphabetical order (especially for small groups), but you shouldn't rely on this behavior.
Behind the scenes, HDF5 is actually retrieving objects in so-called native order, which
basically means “as fast as possible.” The only thing that's guaranteed is that if you don't
modify the group, the order will remain the same.
Dictionary-Style Iteration
In keeping with the general convention that groups work like dictionaries , iterating over
a group in HDF5 provides the names of the members. Remember, these will be supplied
as Unicode strings:
>>> [ x for x in f ]
[u'1', u'10', u'2', u'data']
There are also iterkeys (equivalent to the preceding), itervalues , and iteritems
methods, which do just what you'd expect:
>>> [ y for y in f . itervalues ()]
[<HDF5 group "/1" (0 members)>,
<HDF5 group "/10" (0 members)>,
<HDF5 group "/2" (0 members)>,
<HDF5 dataset "data": shape (100,), type "<f4">]
Search WWH ::




Custom Search