Environmental Engineering Reference
In-Depth Information
the studied group of organisms lives, e.g. aquatic (river or sea) or terrestrial (forest
or agricultural fields). Another dimension is the type of applied machine learning
technique.
The major advantages of decision tree methods include the ability to capture
interactions between the variables used for modelling, the understandability of the
produced models (trees) and their efficiency. Decision tree learning methods can
establish models fast from large quantities of data, involving either a large number
of records (example) or a large number of columns (variables) or both. Also,
decision tree models make predictions very fast and can be used to classify large
numbers of examples: This is important in the context of pixel-based classification
in geographical information systems, where very large numbers of spatial units/
points need to be classified.
Decision tree learning is also capable of identifying the relevant variables from a
large set of independent variables. The resulting trees typically use only a few of the
variables available. This, however, can easily be a disadvantage in some situations:
If all the variables available contribute to the classification, it is very likely that the
tree will not use them all and will hence have lower performance.
Other situations where decision trees may encounter problems are domains
where the variables are completely independent. In addition, small numbers of
examples/records are quite problematic for decision trees. In both situations, using
methods like linear or logistic regression would be more appropriate.
Decision trees are derived from data only. No domain knowledge or limited
amounts thereof are used in the learning process. As such, they represent the data
driven or empirical approach to ecological model construction, which is more
appropriate when we have plenty of high-quality (reliable and relevant) measured
data and little knowledge about the studied system. When only few or low-quality
(unreliable or irrelevant) data are available, and/or there is a considerable know-
ledge about the studied system, the classical knowledge-based paradigm of manual
model construction could be more appropriate.
Search WWH ::




Custom Search