Geoscience Reference
In-Depth Information
Large Datasets
Informatics, data warehousing, and data-mining afford EPA powerful
tools for maximal use of wealth of information that will continue to be gathered
by it, other agencies, and the public on an unprecedented scale. Data analysis
and modeling in many cases will be accomplished through informatics tech-
niques, as is already the case in the analysis of -omics data (Ng et al. 2006;
Baumgartner et al. 2011; Roy et al. 2011). As EPA moves forward with analyz-
ing and modeling large sets of data, it should keep three points in mind:
Information generation and information gathering are accelerating ex-
ponentially, and EPA will not be able to generate all the data needed to address
complex environmental and health problems. It would benefit the agency to con-
tinue to develop its capacity to access, harvest, manage, and integrate data from
diverse sources and different media and across geographic and disciplinary
boundaries rapidly and systematically.
Links between environmental change, exposure, human behavior, and
human health are complex, and seamless integration and dynamic mining of
diverse datasets will boost the chance of discovering such links. For example, to
derive personal exposure estimates for particulate matter smaller than 2.5 µm in
diameter (PM 2.5 ), it is necessary to integrate environmental data, human behav-
ioral data, and insight about how PM 2.5 penetrates various indoor microenviron-
ments. The exposure estimates are then linked to disease-mechanism data and
health data. Such an approach is not difficult to appreciate in principle, but its
practice hinges on how successfully an informatics approach can be adapted to
mine the massive data from diverse systems. EPA has been a leader in air-
quality research and associated health effects of exposure to air pollutants, as
showcased through its contributions to the Six Cities Study (Dockery et al.
1993) and the National Morbidity, Mortality, and Air Pollution Study (Samet et
al. 2000; Dominici et al. 2006), and it is in a strong position to retain its cutting-
edge position by adapting informatics approaches to the analysis and modeling
of diverse and massive datasets.
As environmental challenges continue to emerge and evolve, EPA's
approach to problem-solving will need to be dynamic and adaptive. Having a
cutting-edge capacity of data warehousing, data-mining, bioinformatics, envi-
ronmental informatics, and health informatics will boost EPA's ability to inte-
grate massive external data in a timely fashion, to adopt new techniques, to bor-
row scientific and technical expertise from outside the agency, and to be more
responsive and anticipatory.
As EPA continues to strengthen its informatics infrastructure, it will be
important to pay attention to new analytic and statistical methods to address
emerging modeling issues and to bridge methodologic gaps. Several outstanding
issues warrant high priority. One challenge is to analyze large amounts of data
Search WWH ::




Custom Search