Database Reference
In-Depth Information
not under the management of traditional DBMS systems, but merely appear
as a collection of files under a certain directory structure or following certain
naming conventions. Usually, the files follow a format or schema agreed among
the domain scientists.
An example of such a scientific dataset with hundreds of attributes is the
data from the High Energy Physics (HEP) STAR experiments, 87 that main-
tains billions of data items (referred to as events ) on over hundred attributes.
Most of the data files are in a format called ROOT. 18 , 70 To search for a subset
of the billions of events that satisfy some conditions based on a small num-
ber of attributes requires special data-handling techniques beyond traditional
database systems. We address specifically some of the techniques for eciently
searching through massively large scientific datasets in this chapter.
The need for ecient search and subset extraction from very large datasets
is motivated by the requirements of numerous applications in both scientific
domains and statistical analysis. Here are some such application domains:
high-energy physics and nuclear data generations from experiments and
simulations
remotely sensed or in situ observations in the earth and space sciences,
(e.g., data observations used in climate models)
seismic sounding of the earth for petroleum geophysics (or similar signal
processing endeavors in acoustics/oceanography)
radio astronomy, nuclear magnetic resonance, synthetic aperture radar,
and so forth
large-scale supercomputer-based models in computational fluid dynam-
ics (e.g., aerospace, meteorology, geophysics, astrophysics), quantum
physics, chemistry, and so forth
medical (tomographic) imaging (e.g., CAT, PET, MRI)
computational chemistry
bioinformatic, bioengineering, and genetic sequence mapping
intelligence gathering, fraud detection, and security monitoring
geographic mapping and cartography
census, financial, and other statistical data
Some of these applications are discussed in References 36, 92, and 95. Com-
pared with the traditional databases managed by commercial DBMSs, one
immediate distinguishing property of scientific datasets is that there is almost
never any simultaneous read and write access to the same set of data records.
Most scientific datasets are read-only or append-only . Therefore, there is a
potential to significantly relax the ACID *
properties observed by a typical
* Atomicity, consistency, isolation and durability
Search WWH ::




Custom Search