Database Reference
In-Depth Information
the data is stored and from interacting with the workflow system directly. The
analysis facility should take care of the storage of the data, execution of the
workflow, translation of formats between steps to match each step's input
and output requirements, and generation of products for the user to track
and monitor progress of the analysis process. This suggests the concept of an
“abstract workflow” that is mapped into “concrete (or executable) workflows”
as described in Chapter 13. We expect such facilities to become the norm in
conducting scientific analysis.
In general, data-side analysis facilities will not eliminate replicating data.
One would expect that important data, which large communities share, will be
replicated to multiple sites, each providing its own data-side analysis facility.
For example, climate modeling data generated by long runs on supercom-
puters will most likely be mirrored to multiple sites worldwide. Similarly, it
is expected that some subsets of data will still be moved to scientists' sites,
as cost of cluster hardware continues to fall, and networking speed grows.
As cloud computing and storage grows in use, it is expected that data-side
analysis will be offered on cloud facilities as well.
Scientific Database Management Systems
Historically, the concept of separating the logical organization of the data from
its physical organization dominated the development of database management
systems (DBMSs). This is referred to as “physical data independence”. The
logical organization referred to “what” is the structure of the data, and the
physical organization referred to “how” the data is stored and organized on
physical media, including memory and disk. In order to access the data faster,
different types of storage organization and indexes were invented, which did
not affect the logical organization of the data. Such concepts brought about
the use of DBMSs in many areas, especially in business and commercial ap-
plication domains.
The dominant DBMS system today is still the relational database system.
Its simple data model of representing data as tables, where rows represent in-
stances of objects (such as people, books, etc.), and columns represent proper-
ties (or attributes) of the objects, made it very attractive to many applications.
However, by and large, relational database systems have not been used ex-
tensively by scientific applications. Instead, most large scientific datasets are
stored as files in specific standard file formats, such as NetCDF and HDF5.
There are several reasons for this state of affairs. First, it is the desire of sci-
entists to exchange data by simply sending each other files. By agreeing on a
standard file format for some communities, and even including the metadata
in the header of the files, the files became “self-describing”. Second, there is
Search WWH ::




Custom Search