Geography Reference
In-Depth Information
mean data are permanently stored in data storage. Most analyses will start from
this intermediate data. The other intermediate statistics are calculated depending on
specific application demands.
18.4.2
Issues About Data Management
Data preprocessing changes data properties and generates intermediate results.
These changes are sent back to data storage for permanent archive. This proportion
of intermediate production also requires well organization in the database. Usually,
intermediate productions take more space in the data storage. Database designers
need to assess trade-off between better performance and compact data storage.
Visiting frequency and retrieving volume from users should be considered to assess
the trade-off. Therefore, an efficient design of scientific data management should
consider the emergence of intermediate productions and the possible demand from
data preprocessing and data usage at the very beginning.
18.4.3
Data Analysis
After preprocessing, data are ready for analysis. Data analysis is the process to
discover patterns and useful principles. The typical data analysis for climate studies
using ModelE data includes: mean values, correlation between variables, stationary
over time series, quality of forecasting, and spatiotemporal patterns (von Storch and
Zwiers 1999 ). Although scientists have already developed mature statistical meth-
ods to perform these analyses, they are various challenges for dealing with big data.
Performance is a big issue, especially for on-demand analysis request through
Internet. For example, producing the Taylor diagram, which compares the 5-years
global means of simulated data from 300 model scenarios to the observation data
for detecting simulation quality, takes up to 900 s including the time for calculating
the mean values (Sun et al. 2012 ). Techniques such as generating intermediate
statistics discussed in the last section can speed up the computing performance.
R 1 has developed multi-thread technology to enable statistical analysis on big data
for pursuing higher performance. GPU is of great potential to execute parallel
computing.
Discovering knowledge from massive and multiple dimensional numerical data
is another problem. In the context of geospatial sciences, geo-visualization (e.
g., maps) has been proved to be an efficient method for a prompt understanding
of complex geospatial data (Roth et al. 2008 ). Therefore, geovisual analytics
integrating multiple interactive tools, dynamic graphs and live-linked views of data
representation has the potential to provide an intuitive and heuristic method for
analyzing big data.
1 http://www.revolutionanalytics.com/products/enterprise-big-data.php
Search WWH ::




Custom Search