Databases Reference
In-Depth Information
>>> [( x , y ) for x , y in f . iteritems ()]
[(u'1', <HDF5 group "/1" (0 members)>),
(u'10', <HDF5 group "/10" (0 members)>),
(u'2', <HDF5 group "/2" (0 members)>),
(u'data', <HDF5 dataset "data": shape (100,), type "<f4">)]
There are also the standard keys , items , and values methods, which produce lists
equivalent to the three preceding examples. This brings us to the first performance tip
involving iteration and groups: unless you really want to produce a list of the 10,000
objects in your group, use the iter* methods.
If you're using Python 3, you'll notice that you have only the keys ,
values , and items methods. That's OK; like dictionaries, under Python
3 these return iterables, not lists.
Containership Testing
This is another seemingly obvious performance issue that crops up from time to time.
If you're writing code like this, DON'T:
>>> if 'name' in group . keys ():
This creates and throws away a list of all your group members every time you use it. By
instead using the standard Python containership test, you can leverage the underlying
HDF5 index on object names, which will go very, very fast:
>>> if 'name' in group :
Critically, you can also use paths spanning several groups, although it's very slightly
slower since the intermediate groups have to be inspected by HDF5:
>>> if 'some/big/path' in group :
Very handy. Keep in mind that like accessing group members, the POSIX-style “parent
directory” symbol " .. " won't work . You won't even get an error message; HDF5 will look
for a group named " .. " and determine it's not present:
>>> '../1' in f [ '/1' ]
False
If you're manipulating POSIX-style strings and run into this problem, consider “nor‐
malizing” your paths using the posixpath package:
>>> grp = f [ '/1' ]
>>> path = "../1"
>>> import posixpath as pp
>>> path = pp . normpath ( pp . join ( grp . name , path ))
>>> path
u'/1'
Search WWH ::




Custom Search