Geoscience Reference
In-Depth Information
Very few organisms have the entirety of their genome mapped, and in many cases, it is difficult
to generate moderately detailed data. In fact, the typical genome has billions of base pairs (Sherwin,
2010). For the purposes of simple identification, research is underway into DNA bar coding of spe-
cies, where short segments of DNA are used to identify species (Savolainen et al., 2005). This has
the potential to generate very large amounts of geolocated data, as each sample has some form of
geolocation attached to it.
In summary, it is clear that there is a rich source of data in biology for GC. What is now consid-
ered are some of the possible GC approaches to their analysis.
6.3 ANALYTICAL METHODS
Many biological data can be spatially analysed using established procedures. These methods would
be considered standard practice in GC, for example, using spatial statistics (Laffan, 2006; Kulheim
et al., 2011) or geographically weighted regression (Bickford and Laffan, 2006). Indeed, many such
methods of analysis are also essentially standard practice in spatial ecology (see Fortin and Dale,
2005), and moreover, GC methods are often applied to species distribution models (Miller et al.,
2007; Franklin, 2010). This overlap should not come as a surprise as, after all, statistical and machine
learning approaches are applied across many disciplines. A simple example is the Moran's I statistic
which can be applied equally well to an analysis of human populations (Zhang et al., 2010) as it can
to relationships in a phylogenetic tree (Gittleman and Kot, 1990). The latter is essentially a calcula-
tion that uses neighbours defined by a network, a process demonstrated for traffic networks by Okabe
et al. (2006). These approaches also have the same general limitations as other GC approaches (See,
2014), and the available tools are a mixture of proprietary and open source (Bivand, 2014).
There are, however, key differences in approach. Biological, and certainly ecological, analyses
often use hypothetico-deductive reasoning supported by standard statistical procedures. This is partly
a result of available data and the ability, for smaller organisms at least, of researchers to conduct
manipulative experiments (e.g. Bonser et al., 2010). Such manipulative experiments are typically
impractical for applications normal to GC, and indeed the impracticality of such approaches provides
impetus for GC research. Data mining and machine learning, important approaches in GC, are less
often used in mainstream biology. This is partly because much biological research is focussed on the
discovery of underlying processes and mechanisms and these can be difficult to extract from many
machine learning methodologies. Such methods have, however, received attention in fields such as
bioinformatics (Jensen and Bateman, 2011) and veterinary epidemiology (Ward and Carpenter, 2000).
Spatial and spatio-temporal analysis of biological data can be complex, as one of the common
formats is as collections of geolocated species observations (see Section 6.2.4). At the spatial
precision of most data sets, individual entities are effectively co-located. Indeed, a common
approach is to aggregate the observations into some grouped unit such as a polygon and thereafter
analyse the collections of taxa found within and between each group. The focus of the following
sections is on the analysis of these types of data, which as individual data layers can be analysed
using conventional spatial analyses.
Five approaches are considered. The first three are purely spatial (diversity analyses, generalised
dissimilarity modelling [GDM] and reserve design), while the latter two are spatio-temporal (dis-
ease modelling and movement analyses).
6.3.1 d iVerSity a nalySeS
Diversity analyses underpin our understanding of biodiversity and its geographic distribution. The
challenge for GC in this regard lies primarily in the development of algorithms and tools that can
in parallel analyse the non-geolocated component of a biological data set, comprising information
pertaining to organisms or taxonomic units, and, concurrently, incorporate a spatial element in the
analysis pertaining to their associated geographic distributions.
Search WWH ::




Custom Search