Geoscience Reference
In-Depth Information
Although this list may be far from comprehensive, most GC research would simply gain improve-
ments in speed either by using more efficient algorithms or making better use of existing parallel,
distributed and cloud computing resources, akin to some of the solutions of Parry and Bithell (2012)
and Adnan et al. (2014). It is the responsibility of the GC community to become more involved
in initiatives such as the NCeSS (National Centre for e-Social Science Research - http://www.
merc.ac.uk/) and the NCRM (National Centre for Research Methods - http://www.ncrm.ac.uk/) in
the United Kingdom, projects like the EESI (European Exascale Software Initiative - http://www.
eesi-project.eu/pages/menu/publications/investigation-of-hpc-initiatives.php) in Europe and similar
initiatives in other countries in order to make the best use of what is currently available. See Birkin
and Malleson (2014) for more details. Although quantum computing is on the horizon, it is not clear
at which stage this will begin to have a significant impact on research.
18.3 DATA LIMITATIONS
Back in 1995, Openshaw (1995b) clearly recognised that we were experiencing a spatial data
explosion , the rate of which would only continue to increase in the future. In 2002, it was esti-
mated that the annual size of new information created from both physical media and electronic
channels was 23 EB or 23 trillion MB (Lyman et al. 2003). Of this total, 18 EB was digital and
roughly 170 TB was produced through the Internet. The report equated the volume of information
from the Internet equivalent to the print collection held in 17 US Libraries of Congress. In 2006,
the amount of digital information produced annually was estimated at 161 EB (International
Data Corporation [IDC] 2007) or 3 million times the amount of information contained in all
the topics that have ever been written, while current estimates in 2011 now stand at greater than
1 ZB or 1 trillion GB (International Data Corporation [IDC] 2011) per year. Unfortunately, there
is no indication as to what percentage of this digital universe is georeferenced, but even if it was
only 10% of the total, this would still represent an incredible amount of spatial data produced
annually. In fact, the percentage is likely to be much higher than that given the increasing move
towards a mobile environment and the rapidly expanding number of physical and human sensors
collecting data on a continuous basis. We have now entered the era of big data , where the exact
origin of this term is unclear. McBurney (2012) provides an interesting analysis, finding the first
reference to the term in an economics paper published in 2000. However, it was not until 2009
that the legitimacy of the term was debated in Wikipedia, and it was picked up by Google Trends
as taking off from the end of 2010.
Big data, which include big spatial data, generally refer to datasets that are so large that conven-
tional systems struggle to process the volume of information coming in (Dumbhill, 2012). Big data
can be characterised by the three Vs - velocity, volume and variability (or variety) - which pose a
number of common challenges to all fields including GC. Big data are characterised by enormous
velocity of acquisition. For example, the experiments at CERN generate more than 1 PB of data
per second, which illustrates both volume and velocity very clearly. This input stream then requires
filtering in order to keep only the data of interest (Worth 2011). This raises some interesting storage
and preservation challenges as no satisfactory solution currently exists for preserving all the digital
data being created. Goodchild et al. (2012) recognise the need for extensive research in this area if
big spatial data - which encompass the Digital Earth - are to be preserved for more than just a few
years. The volume of big data also refers to the degree of interlinkages between the data. The linked
open data cloud diagram of Cyganiak and Jentzsch (2011) has been suggested by Janowicz (2012) as
a promising knowledge infrastructure for big data. The final V is variability or variety, which refers
to the increasing number of data sources and data types emerging, much of which is unstructured.
Other challenges of big data include the need to document the data in ways that are useful for users
(metadata/semantics/use-cases) and the need to improve the search facilities in order to help users
find what they need in a more efficient way, for example, through the use of semantically enabled
search engines. The need to better manage the data will become more important, for example,
Search WWH ::




Custom Search