Geoscience Reference
In-Depth Information
• Real-time data
• Archived data
• Field and demonstration project data
• Episodic or case study data
• Data from related disciplines (hydrology, oceanography, cryosphere, chemical
and biosphere--soil, vegetation, canopy, evapotranspiration)
• GIS databases
The fi rst four categories, and to a lesser extent the fi fth one, include data from
in-situ and remote sensing observations, and output from models. Even though each
discipline within the geosciences is unique and has different data needs depending
on the use or application, the geoscience disciplines do share a common interest in
accessing data of the types listed above. For instance, the fi rst four data types are
important for many applications in atmospheric sciences, oceanography, hydrology,
geologic subdisciplines of seismology, and volcanology. Another common attribute is
the need for georeferencing and integration with information contained in GIS data-
bases. This brings up an important area of ongoing research, namely, the development
of a common data model for the geosciences, as they share a common representation
of data in their spatial and temporal representations. A discussion of data models for
geosciences is beyond the scope of this chapter, however.
Data Deluge, Data Mining, and Knowledge Discovery
Advances in computing, modeling, and observational systems have resulted in a veri-
table increase in the volume of data. These data volumes will continue to see exponen-
tial growth in the coming years. For example, data from current and future observing
systems will result in a 100-fold increase in volume in the next decade. The GOES-R
satellite, scheduled for launch in 2012, will have a hyperspectral sounder with ap-
proximately 1,600 channels. In contrast, the current generation GOES satellite sound-
ers have 18 thermal infrared channels. Similarly, each NPOESS satellite when fully
deployed will have raw data rates of nearly 1 Terabyte each day. Hey (2003) previews
the imminent data deluge from the next generation of simulations, sensors, and model-
ing systems and experiments, and discusses the importance of metadata and the need
to automate the process of converting raw data to useful information and knowledge
and implications for grid middleware architecture.
The data deluge clearly requires extraction of higher level information useful to
users. The process of extracting higher level information is referred to as data mining.
Data mining is a key step toward data reduction and knowledge discovery. Graves
(1996) and Ramachandran et al. (1999) offer a methodology to effi ciently mine and
extract content-based metadata from Earth Science datasets and describe the capabili-
ties of the ADaM (A Data Miner) tool, which enables phenomena-oriented data min-
ing by incorporating knowledge of phenomena and detection algorithms in the system.
Their methodology provides a meaningful solution needed by users to convert data
to knowledge and cope with the data deluge. An ideal data system or service should
include algorithms and facilities for data mining that can be applied to data sets as
needed by users. Future success will depend on how well users are served by such
discovery and mining tools and services pertaining to data integration.
 
Search WWH ::




Custom Search