Database Reference
In-Depth Information
applications postprocessing usually generates summary data whose volume is
smaller. For example, climate-modeling simulation variables such as tempera-
ture, which are generated using time granularity of a day or even an hour, are
typically summarized in the postprocessing phase into monthly means. An-
other activity at the postprocessing phase is to reorganize the large volume
of data to fit better projected analysis patterns. For example, a combustion
model simulation may generate a large number of variables per space-time
cell of the simulation. The variables may include measures of pressure or
temperature as well as many chemical species. In the analysis phase it is
typical that not all variables are needed at the same time. Therefore, or-
ganizing the data by variable for all time steps prevents getting the entire
volume of data in order to extract the few variables needed in the analy-
sis. This process is often referred to as the “transposition” of the original
data. In this phase, it is necessary to ensure that enough space is allocated
for the input as well as for the output. Because of the potential large vol-
ume of space required, postprocessing tasks are often performed piecewise
or delegated to multiple processors, each dealing with part of the data vol-
ume. That introduces the issue of reliable data movement between sites and
processors.
The data analysis phase typically involves exploration over part of the data,
such as climate analysis involving sea-surface temperature and wind velocity
in the Pacific Ocean. The most reasonable thing to do is to extract the needed
data at the site where the data is stored and only move to the analysis site the
needed subset of the data. However, such data extraction capabilities are not
always available, and scientists end up copying and storing more data than
necessary. Alternatively, the data analysis could be performed in the site or
near the site where the data resides. Here again, such analysis capabilities are
not usually available. Furthermore, many scientists are only comfortable with
their own analysis environments and usually get the data they want to analyze
to their local site. In general, at the data analysis phase, storage has to be
allocated ahead of time in order to bring a subset of the data for exploration
and to store the subsequently generated data products. Furthermore, storage
systems shared by a community of scientists need a common data access
mechanism that allocates storage space dynamically, manages its content,
and automatically removes unused data to avoid clogging the shared data
stores.
When dealing with storage, another problem facing the scientist today is the
need to interact with a variety of storage systems. Typically, each storage sys-
tem provides different interfaces and security mechanisms. For these reasons,
several activities emerged over time in order to standardize and streamline
the access to the storage systems through common interfaces and manage dy-
namically the storage allocation and the content of these systems. The goal is
to present the scientists or software utilities with the same interface regard-
less of the type of storage system is used. Ideally, the management of storage
allocation should become transparent to the client.
Search WWH ::




Custom Search