Dynamic Storage Management - Scientific Data Management

Database Reference

In-Depth Information

applications postprocessing usually generates summary data whose volume is

smaller. For example, climate-modeling simulation variables such as tempera-

ture, which are generated using time granularity of a day or even an hour, are

typically summarized in the postprocessing phase into monthly means. An-

other activity at the postprocessing phase is to reorganize the large volume

of data to fit better projected analysis patterns. For example, a combustion

model simulation may generate a large number of variables per space-time

cell of the simulation. The variables may include measures of pressure or

temperature as well as many chemical species. In the analysis phase it is

typical that not all variables are needed at the same time. Therefore, or-

ganizing the data by variable for all time steps prevents getting the entire

volume of data in order to extract the few variables needed in the analy-

sis. This process is often referred to as the “transposition” of the original

data. In this phase, it is necessary to ensure that enough space is allocated

for the input as well as for the output. Because of the potential large vol-

ume of space required, postprocessing tasks are often performed piecewise

or delegated to multiple processors, each dealing with part of the data vol-

ume. That introduces the issue of reliable data movement between sites and

processors.

The data analysis phase typically involves exploration over part of the data,

such as climate analysis involving sea-surface temperature and wind velocity

in the Pacific Ocean. The most reasonable thing to do is to extract the needed

data at the site where the data is stored and only move to the analysis site the

needed subset of the data. However, such data extraction capabilities are not

always available, and scientists end up copying and storing more data than

necessary. Alternatively, the data analysis could be performed in the site or

near the site where the data resides. Here again, such analysis capabilities are

not usually available. Furthermore, many scientists are only comfortable with

their own analysis environments and usually get the data they want to analyze

to their local site. In general, at the data analysis phase, storage has to be

allocated ahead of time in order to bring a subset of the data for exploration

and to store the subsequently generated data products. Furthermore, storage

systems shared by a community of scientists need a common data access

mechanism that allocates storage space dynamically, manages its content,

and automatically removes unused data to avoid clogging the shared data

stores.

When dealing with storage, another problem facing the scientist today is the

need to interact with a variety of storage systems. Typically, each storage sys-

tem provides different interfaces and security mechanisms. For these reasons,

several activities emerged over time in order to standardize and streamline

the access to the storage systems through common interfaces and manage dy-

namically the storage allocation and the content of these systems. The goal is

to present the scientists or software utilities with the same interface regard-

less of the type of storage system is used. Ideally, the management of storage

allocation should become transparent to the client.

Scientific Data Management

Search WWH ::

Custom Search

Home