Advanced Analytical Theory and Methods: Classification - Data Science and Big Data Analytics

Database Reference

In-Depth Information

7.4 Additional Classification Methods

Besides the two classifiers introduced in this chapter, several other methods are

commonly used for classification, including bagging [15], boosting [5], random

forest [4], and support vector machines (SVM) [16]. Bagging, boosting, and random

forest are all examples of ensemble methods that use multiple models to obtain

better predictive performance than can be obtained from any of the constituent

models.

Bagging (or bootstrap aggregating) [15] uses the bootstrap technique that

repeatedly samples with replacement from a dataset according to a uniform

probability distribution. “With replacement” means that when a sample is selected

for a training or testing set, the sample is still kept in the dataset and may be selected

again. Because the sampling is with replacement, some samples may appear several

times in a training or testing set, whereas others may be absent. A model or base

classifier is trained separately on each bootstrap sample, and a test sample is

assigned to the class that received the highest number of votes.

Similar to bagging, boosting (or AdaBoost) [17] uses votes for classification to

combine the output of individual models. In addition, it combines models of the

same type. However, boosting is an iterative procedure where a new model is

influenced by the performances of those models built previously. Furthermore,

boosting assigns a weight to each training sample that reflects its importance, and

the weight may adaptively change at the end of each boosting round. Bagging and

boosting have been shown to have better performances [5] than a decision tree.

Random forest [4] is a class of ensemble methods using decision tree classifiers. It

is a combination of tree predictors such that each tree depends on the values of a

random vector sampled independently and with the same distribution for all trees

in the forest. A special case of random forest uses bagging on decision trees, where

samples are randomly chosen with replacement from the original training set.

SVM [16] is another common classification method that combines linear models

with instance-based learning techniques. Support vector machines select a small

number of critical boundary instances called support vectors from each class and

build a linear decision function that separates them as widely as possible. SVM

by default can efficiently perform linear classifications and can be configured to

perform nonlinear classifications as well.

Search WWH ::

Custom Search

Home