Geoscience Reference
In-Depth Information
will range from highly unstructured to highly structured data. These factors will
require even more multidisciplinary collaboration among agency scientists.
Warehousing and Mining
As increasingly large amounts of data continue to be generated through
designated systems—such as environmental monitoring, biomarker and other
exposure surveillance data, disease surveillance, and designed epidemiologic
and experimental studies—or streamed from community crowdsourcing, EPA is
faced with both an opportunity and a challenge of channeling and integrating
data into a massive “data warehouse”. Data warehousing is a well-developed
concept and a common practice in business (Miller et al. 2009). In EPA, the
adaptation of and transition to data warehousing will continue to evolve with
good protocols, such as EPA's Envirofacts Warehouse (Pang 2009; Egeghy et
al. 2012) and the Aggregated Computational Toxicology Resource (Egeghy et
al. 2012; Judson et al. 2012). In the future, data in EPA's warehouse will come
from diverse sources, from multiple media, and across geographic, physical, and
institutional boundaries. Recent efforts to integrate the US Geological Survey's
National Water Information System with EPA's Storage and Retrieval System
are an example (Beran and Piasecki 2009). To harvest relevant information from
massive datasets to support EPA's science and regulatory activities, integration
of heterogeneous databases and mining of these massive datasets present some
new opportunities. A recent application involving the European Union's Water
Resource Management Information System is a case in point (Dzemydienė et al.
2008).
Data-mining has become a standard for analyzing massive, multisource,
heterogeneous data on consumer behavior used in business (Ngai et al. 2009).
EPA should and can adopt this data analytic paradigm to support its knowledge-
discovery process. The paradigm is increasingly important at a time when the
discovery of new evidence or a new data model can be bolstered by dynamic
mining of large amounts of data, including environmental indicators of air and
water, satellite imagery of climate change from representative population data-
bases, health indicators from disease surveillance systems and medical data-
bases, social behavioral patterns, individual lifestyle data, and -omics data and
disease pathways. That will require EPA to invest its resources to continue the
development of new analytic and computational methods to deal with static
datasets (for example, modeling of complex biologic systems and air and water
models) and to adapt and develop new data-mining techniques to process, visu-
alize, link, and model the massive amounts of data that are streaming from mul-
tiple sources. EPA is making progress in that direction in its Aggregated Com-
putational Toxicology Resource System (Judson et al. 2012). Successful cases
have also been reported for ecologic modeling (Stockwell 2006), air-pollution
management (Li and Shue 2004), and toxicity screening (Helma et al. 2000;
Martin et al. 2009), to name a few.
 
Search WWH ::




Custom Search