Database Reference
In-Depth Information
7.4 Additional Classification Methods
Besides the two classifiers introduced in this chapter, several other methods are
commonly used for classification, including bagging [15], boosting [5], random
forest [4], and support vector machines (SVM) [16]. Bagging, boosting, and random
forest are all examples of ensemble methods that use multiple models to obtain
better predictive performance than can be obtained from any of the constituent
models.
Bagging (or bootstrap aggregating) [15] uses the bootstrap technique that
repeatedly samples with replacement from a dataset according to a uniform
probability distribution. “With replacement” means that when a sample is selected
for a training or testing set, the sample is still kept in the dataset and may be selected
again. Because the sampling is with replacement, some samples may appear several
times in a training or testing set, whereas others may be absent. A model or base
classifier is trained separately on each bootstrap sample, and a test sample is
assigned to the class that received the highest number of votes.
Similar to bagging, boosting (or AdaBoost) [17] uses votes for classification to
combine the output of individual models. In addition, it combines models of the
same type. However, boosting is an iterative procedure where a new model is
influenced by the performances of those models built previously. Furthermore,
boosting assigns a weight to each training sample that reflects its importance, and
the weight may adaptively change at the end of each boosting round. Bagging and
boosting have been shown to have better performances [5] than a decision tree.
Random forest [4] is a class of ensemble methods using decision tree classifiers. It
is a combination of tree predictors such that each tree depends on the values of a
random vector sampled independently and with the same distribution for all trees
in the forest. A special case of random forest uses bagging on decision trees, where
samples are randomly chosen with replacement from the original training set.
SVM [16] is another common classification method that combines linear models
with instance-based learning techniques. Support vector machines select a small
number of critical boundary instances called support vectors from each class and
build a linear decision function that separates them as widely as possible. SVM
by default can efficiently perform linear classifications and can be configured to
perform nonlinear classifications as well.
Search WWH ::




Custom Search