Hardware Reference
In-Depth Information
3. File size should scale without significantly affecting access performance.
It is expected that files can have a wide variation in size. On the other
hand, performance should not vary in a proportional manner. For ex-
ample, accessing time of the same amount of information from a 1-GB
and a 100-GB file should be comparable.
4. Reduce the dependence on third-party software. Similar to the cus-
tomers, software vendors want to avoid the complexity of multiple soft-
ware facilities that are outside of their control but they still need to
support in the eyes of their customers. Furthermore, when third-party
software is used, and Open Source Software (OSS) in particular, changes
to the source code should be limited to the minimum.
This list of requirements is limited to the ones that are relevant for the dis-
cussion in this section.
20.4 Using HDF5 in Industrial Stochastic Simulations
There aren't many software systems available that can t the bill. Many
software vendors devise their own solutions, in essence, reinventing the wheel.
Such homebrewed solutions provide a lot of flexibility and control but also
increase development and maintenance costs. They encompass a limited set
of requirements available at the time of development and notoriously fail to
scale or adapt to different business and data models. Another potential dan-
ger is that these solutions are tightly linked to the client code. The lack of
appropriate interfaces makes it even more dicult to exchange parts of the
code. In other cases, vendors use systems that are not well suited for the
purpose. Database management or other general-purpose data management
systems have failed in the past to deliver. There is a compromise between I/O
performance and access granularity which deteriorates as datasets increase in
size.
HDF5 is a potentially strong contender in this arena. It is one of the
few libraries to advertise its ability to handle large datasets and its high-
end industrial and academic provenance makes it appealing, or at least to
some extent promising that it can be useful for resolving simulation data
management problems. Indeed, HDF5 is versatile and simple to be used for
storing simulation data. The in-file data model, and the ability to arrange
data and fine tune the access granularity and layout on disk, makes it possible
for different clients to access data without sharing data models, providing
adequate I/O performance. However, a parallel I/O implementation fulfilling
the requirements of industrial customers faces some significant challenges. In
the following, some areas where HDF shines or fails are outlined.
 
Search WWH ::




Custom Search