Environmental Engineering Reference
In-Depth Information
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
first predictor
0.6
0.8
1.0
Fig. 13.2 Visualizing the parameter space supported by data. In this case, the top-right and
bottom-left corner of the parameter space of the two predictors has not actually been sampled by
data (despite a low correlation of r ¼ 0.26). The parameter space actually sampled is indicated
by the convex hull, covering 57% of the area, and it declines dramatically with the number of
dimensions (“curse of dimensionality”). In other words: we have few data points to look at
interactions of higher order
large: e.g. Hastie et al. 2009). For more traditional approaches (and here I am
thinking of GLMs), we may want to have these steps functionally separated.
Model Formulation
We have reduced our data set to a moderate number of predictors in the step
“Dimensional reduction” above. Now we still need to specify in which functional
form the predictors are allowed to correlate with the response. In early years, both
non-linear and interactive model terms were neglected, making many of their
findings less trustworthy. Modern methods (such as BRT) will automatically have
non-linearity and interactions build-in. It is still important to understand the rele-
vance of non-linearity and interactions, even when using the tree-based methods,
because we still have to be able to interpret the results. The information on the
importance of a variable often returned by machine-learning algorithms does not
allow us to see how the variables act. As shown in the case study at http://www.
mced-ecology.org (Where's the sperm whale?), the functional relationship must be
 
Search WWH ::




Custom Search