Modelling Species’ Distributions - Modelling Complex Ecological Dynamics

Environmental Engineering Reference

In-Depth Information

species richness. This model proposal is then fitted to the data. In machine learning,

we propose only the set of predictors, but not the model structure. Here, an algorithm

builds a model proposal, fits it to a part of the data set and evaluates its performance

on the other part of the data. It then proposes a modification of the original model and

so forth. Machine-learning algorithms 11 differ in scope, origin, complexity, and

speed, but they all share this validation step which is used to steer the algorithm

towards a better model formulation. There are plenty of studies comparing different

modelling approaches (Guisan et al. 2007; Meynard and Quinn 2007; Pearson et al.

2006; Segurado and Ara ´ jo 2004). Rather, we shall continue using GLM and BRT as

representatives for the two most common good approaches.

The choice of model type has much to do with availability of software, current

fashion and, of course, with the specific aim of the study. Further complications arise

if the design of the survey may require a mixed model approach (e.g. due to repeated

measurements or surveys split across observers), if spatial autocorrelation needs to be

addressed, if zero-inflated distributions have to be employed, and if corrections for

detection probability shall bemodeled. The more additional requirements are imposed

on the model, the more GLMs become the sole possible method. 12 Alternatively, you

may want to go for a Bayesian SDM (see Latimer et al. 2006, for a primer).

If your data and model require an unusual combination of steps (say a combination

of zero-inflated data with nested design and spatial autocorrelation, while predictors

are highly correlated and many values missing), and you develop a way to cook this

dish, then you should do (at least) two things: Firstly, evaluate your method for its

ability to detect an effect that you know is there (“sensitivity”). Secondly, evaluate

your method for its specificity to detect effects that you know are not there. Both

evaluations should be amply replicated, should be based on simulated data (so that

you know the truth) and should (finally) confirm that your new methods is reliable!

Spatial Autocorrelation

Spatial autocorrelation (SAC) refers to the phenomenon that data points close to each

other in space are more alike than those further apart. For example, species richness in

a given site is likely to be similar to a site nearby, but very different from sites far away.

This is mainly due to the fact that the environment is more similar within a shorter

distance. Hence, SAC in the raw data (species occurrence) is a consequence of SAC in

the environment (topography, climate), something Legendre (1993) termed “spatial

11 http://www.machinelearning.org/ is a good place to start exploring this field.

12

Most of these “complications” can be handled by standard extensions of GLMs (see, e.g. Bolker

2008, and various dedicated R-packages). They will, however, make the model less stable, require

larger run-times and still rely on getting the distribution right. There is, of course, the alternative of

Bayesian implementations. Since these are also fundamentally maximum likelihood approaches,

they are similar to sophisticated GLMs. In any case, there is no Bayesian Boosted Regression Tree

(not to speak of a combination with spatial terms and mixed effects). It runs against the Bayesian

philosophy to use boosting or bagging, and there is no efficient implementation either.

Search WWH ::

Custom Search

Home