Biomedical Engineering Reference
In-Depth Information
Algorithm 2 High-Level Pseudo-Code for Building a Decision Tree
1: Given: Training set T
2: function Decision-Tree-Building (Set T )
3: if T meets the stopping criteria (Required purity reached) then
4: Return
5: else
6: for each attribute in T do
7: Find the best values to split T along the i -th attribute
8: end for
9: Use the attribute that gives the best split among all the attributes to partition T into T 1
and T 2
10: Decision-Tree-Building ( T 1 )
11: Decision-Tree-Building ( T 2 )
12: end if
a new test point is performed by traversing the tree from its root to the leaves,
taking the right arcs according to the values of the example's features.
The most well-known algorithms for building decision trees are CART [36],
ID3 [37], and C4.5 [38]. These algorithms construct the decision tree in a greedy
fashion, starting from the root. A schematic representation of a tree building al-
gorithm is shown in Algorithm 2. At each node, the data is split according to a
decision criterion based on the statistics of each feature. This process is repeated
recursively until the nodes reach a desired purity.
Decision trees are a simple yet effective classification. One of their main ad-
vantages is that they provide human-readable rules of classification. Decision trees
have several drawbacks, especially when trained on large data, where classifiers
tend to result in complicated trees, which in turn require either pruning or else
a large memory for storage. There is considerable literature on methods for sim-
plifying and pruning decision trees (for example, [39]). Techniques for handling
large data on a single node include transformation and sorting (SPRINT, [40]),
sorting and using data appropriate data structures (SLIQ, [41]), sampling [38],
and approximation (SPEC, [42], with or without parallelization). Direct use of
parallelization has been demonstrated, for example, in [43].
Many classification algorithms build a functional mapping from input patterns
to output labels, optimizing an objective function that aims to minimize the training
error.
A simple linear mapping can be performed using the Moore-Penrose pseudo-
inverse, which links the input patterns to the labels via the weighted sum of the
input patterns to optimize the mean square error. If the training patterns are repre-
sented in a matrix of size N
D where D is the input dimension and N the number
of examples, and the corresponding labels in a N
×
×
1 vector T
∈{−
1
, +
1
}
, the op-
timal weights for the classifier are given by
P T
P 1
P T
·
·
·
w
=
T
(7.1)
and the classification rule is
Search WWH ::




Custom Search