Database Reference
In-Depth Information
Conclusions and Future Outlook
Arie Shoshani and Doron Rotem, Editors
In response to the increasing volume and complexity of scientific data, many
projects and activities have taken place over the last decade. In this topic we
have addressed these issues in depth, by describing existing technologies, and
how they were used successfully in real example applications. Specifically, we
addressed the state of the art of storage technologies, techniques for achiev-
ing ecient I/O, and standards for managing storage systems, and large-scale
data movement. We also covered methods of ecient in-memory data move-
ment, workflow technology to achieve real-time monitoring, and metadata
and provenance management techniques to collect the history of runs. Other
topics that were discussed in detail include optimization methods for man-
aging streaming data from sensors and satellites, the integration of diverse
geo-science datasets, ecient methods for data searching, analysis and visu-
alization for data explorations, and new types of scientific data management
systems.
Some of the main conclusions in the chapters of this topic are captured
next. In the storage area, while new emerging storage technologies are now
introduced, in the near term it is expected that the high performance com-
puting industry will primarily continue to use disks and tapes to store data. A
new technology that is gaining importance is the development of SSDs (Solid
State Disk Drives). The cost of SSDs has dropped suciently to make such
devices viable when integrated alongside traditional hard drives in a parallel
storage system in order to improve latency of common operations. The main
challenges to the continued use of larger numbers of disks with increased ca-
pacity are power consumption, recovery from failures, and maintaining data
integrity.
Building effective parallel file systems that can take advantage of many
thousands of disks will continue to be an important challenging activity. Stan-
dards for accessing different storage systems through a uniform interface have
already proven viable in simplifying the client applications that use them.
Such standards will continue to evolve in the future to permit co-scheduling
of compute, storage, and network resources. An important challenge is the de-
velopment of algorithms and schedulers that can make use of historical data
to optimize resource usage, and recover from transient failures. It is predicted
that in the future it will not be feasible to store all the raw data generated by
large-scale simulations. Consequently, it will be necessary to process much of
509
Search WWH ::




Custom Search