Geography Reference
In-Depth Information
performance is built based on PostGIS database to support data discovery; both
local and remote access functions are provided to users to search, present and use
geospatial data.
GEOSS clearinghouse is the engine of the GEOSS and provides discovery and
access functions to worldwide users with big data; hence, GEOSS clearinghouse
needs a well-designed scalability function to support reasonable concurrent search
requests. To meet this requirement, GEOSS clearinghouse is deployed on Amazon
EC2 cloud. Using the cloud resources, GEOSS clearinghouse can provide higher
performance and more concurrent requests by improving the flexibility and scala-
bility (Huang et al. 2010 ).
18.4
Big Data Processing
Due to the complexity and large volume of big data, single computing resource
cannot produce and handle the data sufficiently. After a scientific model is devel-
oped, the model needs to be sent to different computing platforms for simulations.
Results or observations will be collected and stored into distributed data storages.
Big data processing is related to further operations on these data, which includes
data preprocessing, data management of intermediate productions and data usage
(analysis and visualization, Fig. 18.4 ). We will use the processing of ModelE data
of climate@home project as an example to explain how data processing is handled.
18.4.1
Data Preprocessing
Raw data returned into storages are usually not well organized, fragment, and
unusable. They may be in different formats, projections, spatial or temporal
coverage. Therefore, data preprocessing such as reformatting, re-projecting, scaling,
Fig. 18.4
Typical big spatiotemporal data processing workflow
Search WWH ::




Custom Search