Database Reference
In-Depth Information
DBMS system. This may give rise to different types of data access methods
and different ways of organizing them as well.
Consider a typical database in astrophysics. The archived data include ob-
servational parameters such as the detector, the type of observation, coor-
dinates, astronomical object, exposure time, and so forth. Besides the use
of data-mining techniques to identify features, users need to perform queries
based on physical parameters such as magnitude of brightness, redshift, spec-
tral indexes, morphological type of galaxies, photometric properties, and so
forth, to easily discover the object types contained in the archive. The search
usually can be expressed as constraints on some of these properties, and the
objects satisfying the conditions are retrieved and sent downstream to other
processing steps such as statistics gathering and visualization.
The datasets from most scientific domains (with the possible exception of
bioinformatics and genome data), can be mostly characterized as time-varying
arrays. Each element of the array often corresponds to some attribute of the
points or cells in two- or three-dimensional space. Examples of such attributes
are temperature, pressure, wind velocity, moisture, cloud cover, and so on in
a climate model. Datasets encountered in scientific data management can be
characterized along three principle dimensions:
Size: This the number of data records maintained in the database. Scientific
datasets are typically very large and grow over time to be terabytes or
petabytes. This translates to millions or billions of data records. The
data may span hundreds to thousands of disk storage units and often
are archived on robotic tapes.
Dimensionality: The number of searchable attributes of the datasets may
be quite large. Often, a data record can have a large number of at-
tributes, and scientists may want to conduct searches based on dozens
or hundreds of attributes. For example, a record of a high-energy colli-
sion in the STAR experiment 87 is about 5 MB in size, and the physicists
involved in the experiment have decided to make 200 or so high-level
attributes searchable. 101
Time: This concerns the rate at which the data content evolves over time.
Often, scientific datasets are constrained to be append-only as opposed
to frequent random insertions and deletions as typically encountered in
commercial databases.
Traditional DBMSs such as ORACLE, Sybase, and Objectivity have not had
much success in scientific data management. These have had only limited ap-
plications. For example a traditional relational DBMSs, MySQL, is used to
manage the metadata, while the principal datasets are managed by domain-
specific DBMSs such as ROOT. 18 , 70 It has been argued by Gray et al. 35
that managing the metadata with a nonprocedural data manipulation lan-
guage combined with data indexing is essential when analyzing scientific
datasets.
Search WWH ::




Custom Search