Global Positioning System Reference
In-Depth Information
and Simple Storage Service (S3), Azure Services Platform, DynDNS, Google
Compute Engine, and HP Cloud. Users can dynamically provision the
virtualized hardware resources and confi gure them. One advantage of
IaaS over the other models is that the cloud user has more control over
the distribution logic of tasks necessary to complete either a single or
multiple service requests. Servers can be dynamically provisioned at peak
request times, and de-provisioned thereafter. Similarly, for long running
processes, servers can be provisioned provided the task scheduling and
distribution logic for the task is known. In addition to EC2 and S3 cloud
platforms, Amazon also offers Amazon Elastic MapReduce (EMR) which
uses the Hadoop framework (Hadoop Distributed File System (HDFS), 23
MapReduce, 24 Pig, 25 Hive, 26 etc.) for running big data storage and analytics.
To test the utilization of IaaS cloud for Geosciences applications, Huang
et al. (2010) deployed the Global Earth Observation System of Systems
(GEOSS) Clearinghouse clearing metadata catalog service in the Amazon
EC2. Similarly, Baranski et al. (2010) proposed a pay-per-use revenue model
for geoprocessing services in the Cloud to support future business models
of geoprocessing in the cloud.
In situ processing
In situ data processing denotes the ability to access data directly “in-place”,
without having to import it to the database beforehand (Alagiannis et al.
2012). That means that complex, ad-hoc analytics can be easily performed
on external data sources, avoiding pre-loading into the database and all
overhead incurred by this. In certain situations, in situ processing would
be preferred and perhaps is the only way to work with data, even though
in some cases it might be slower than in-database processing due to a lack
of adaptability to I/O access patterns and internal optimization.
In situ processing is most useful when working with existing, legacy
data archives where data is already stored in a certain structure, and many
services are built to assume this structure. Modifying the data archive is
not an option and importing it into a database leads to unnecessary data
duplication. In situ processing is non-invasive, so a database with such
23 Similar to Google File System, HDFS is a scalable, fault tolerant, distributed fi le system
designed to run on commodity hardware
24 MapReduce is a software framework and programming model for easily writing applications
which process vast amounts of data in-parallel on large clusters of commodity hardware in
a reliable, fault-tolerant manner
25 Pig is a platform for analyzing large data sets that consist of a high-level language for
expressing analysis programs on data stored on Hadoop
26 Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-
hoc queries, and the analysis of large datasets stored in Hadoop compatible fi le systems
Search WWH ::




Custom Search