Environmental Engineering Reference
In-Depth Information
plotted to gauge its shape. For interactions we need to plot each variable at each
level of the other variable, thus visualizing synergistic or compensatory effects of
the two variables.
The key idea behind SDM, i.e. the environmental niche of a species, implies a
hump-shaped relationship between any environmental predictor and a species'
occurrence: there are lower and upper limits. Hence, we must allow the model to
be nonlinear. If we happen to only sample a part of the entire gradient, we also
need to consider saturation curves, which are again non-linear. The simplest, and
generally sufficient, way to include non-linearity is by generating a new, squared
dummy variable for each continuous predictor. 8 This represents the third element
of a Taylor series (which can be expanded to represent any continuous function).
When using GAM or other spline-based approaches, non-linearity is governed by
the smoothing function used. Here the issue is not so much how to model non-
linearity, but rather how much non-linearity we allow for. Reducing the “wiggli-
ness” of splines (either by stepwise model selection for the number of knots in
each predictor 9 or by shrinkage of spline fits 10 ) prevents over-fitting and should
be the standard approach.
Interactions are similarly relevant. Statistically, an interaction is the product of
the participating main effects. Ecologically, it means that we need to know the
value of all variables included in the interaction, not only the main effects. Because
this is highly relevant and often difficult for the beginner, let me briefly give an
example. Assume that global patterns of plant diversity are well-predicted by the
predictors “annual precipitation” and “mean annual temperature” - and their
interaction. For the main effects, wet or hot means more species, but not necessar-
ily. When a site is hot, it needs to also be wet to have high species richness;
otherwise it may well be a barren desert. But when cold, a site will never support
many plant species, independent of precipitation. In this example, neither tempera-
ture nor rainfall alone is sufficient to predict species richness at any site, but we
need to interpret them in concert.
Classification and regression trees (CARTs) embrace non-linearity and interac-
tions in an elegant and natural way. Their boosted (BRT) or bagging (randomFor-
est) extensions hence do not require specification of non-linearity and interactions.
Model Simplification
One of the fundamental problems in building statistical models is the trade-off
between the variance explained by the model, and the bias it produces when
8
This can be done either manually (X1.2 < -X1^2) or as part of the model formula (y~X1+I
(X1^2)); higher-order polynomials should be specified using poly (y ~ poly(X1, degree ¼ 3)),
which calculates orthogonal polynomials.
9
As proposed for the function gam in package gam: see ?gam::step.gam.
10 As proposed for the function gam in package mgcv: see ?mgcv::step.gam.
Search WWH ::




Custom Search