Geoscience Reference
In-Depth Information
To enable such large capacity data and processing, new technologies like Hadoop
are gaining importance. These technologies have been adopted by organizations that
typically deal with large geospatial data silos that are interconnected within a dis-
tributed computing environment. Several ecosystems are being developed around
these technologies across a range of applications. Hadoop is based on two base
components: a distributed file system (HDFS) for storage and MapReduce for com-
putational capabilities. Large data are stored in a HDFS on different data nodes. The
MapReduce model is responsible for the decomposition of a job that is submitted by
a client into smaller jobs. It distributes them in parallel to the different computing
nodes. The jobs are optimized such that each node can access its own data locally
and large data transfers between the computing nodes over the network are avoided.
An example of a cloud computing platform is theGoogle Earth Engine for process-
ing satellite imagery. It can access a large archive of satellite imagery including Land-
sat, which has led to the impressive production of High-Resolution Global Maps of
21st-Century forest cover change (Hansen et al. 2013). Software developers can
interact with the Google Earth Engine through the earthengine API, available for
JavaScript and Python.
Meanwhile, another evolving technology is related to raster databases. Examples
of these databases include Rasterlite (see also Sect. 2.7.2 ) and Rasdaman, which are
already supported by GDAL. It is not yet clear how fast raster databases will gain
popularity, but there are already some clear advantages over the use of traditional
raster files. For instance, databases offer more flexibility in terms of running spatial
queries, multi-user access and data sharing. In addition, spatial indexing and adaptive
tiling can offer fast data access for a particular region of interest. However, loading
large data files into a database can take time, but this needs to be done only once.
19.2 Anticipated EO Data and Related
Software Requirements
One of the main advantages of the command line utilities that we have described is
their ability to be easily integrated into scripts for batch processing. This is becoming
more important with the on-set of large image datasets that are increasingly becoming
available from the range of airborne and space-borne sensors. For instance, when
the USGS made the Landsat archive freely available it opened up a wide range
of applications to monitor the environment over several decades. Furthermore, the
Landsat 8 data is continuing this trend of free VHR, multi-spectral data.
The European Space Agency is also adopting a similar business model with the
launch of Sentinel-1C in April 2014 and the future Sentinel missions, particularly
Sentinel-2 due to be launched in 2015. The image data emanating from thesemissions
will be freely available at high spatial and temporal resolutions and command line
utilities will provide an effective means of harnessing information and knowledge
from these datasources. Similarly, constellations of sensors like Pléiades, DMC and
Rapideye are proving to be more flexible with respect to the image data collection.
 
Search WWH ::




Custom Search