Machine Learning Techniques for Large Data - High-Throughput Image Reconstruction and Analysis

Biomedical Engineering Reference

In-Depth Information

Algorithm 2 High-Level Pseudo-Code for Building a Decision Tree

1: Given: Training set T

2: function Decision-Tree-Building (Set T )

3: if T meets the stopping criteria (Required purity reached) then

4: Return

5: else

6: for each attribute in T do

7: Find the best values to split T along the i -th attribute

8: end for

9: Use the attribute that gives the best split among all the attributes to partition T into T 1

and T 2

10: Decision-Tree-Building ( T 1 )

11: Decision-Tree-Building ( T 2 )

12: end if

a new test point is performed by traversing the tree from its root to the leaves,

taking the right arcs according to the values of the example's features.

The most well-known algorithms for building decision trees are CART [36],

ID3 [37], and C4.5 [38]. These algorithms construct the decision tree in a greedy

fashion, starting from the root. A schematic representation of a tree building al-

gorithm is shown in Algorithm 2. At each node, the data is split according to a

decision criterion based on the statistics of each feature. This process is repeated

recursively until the nodes reach a desired purity.

Decision trees are a simple yet effective classification. One of their main ad-

vantages is that they provide human-readable rules of classification. Decision trees

have several drawbacks, especially when trained on large data, where classifiers

tend to result in complicated trees, which in turn require either pruning or else

a large memory for storage. There is considerable literature on methods for sim-

plifying and pruning decision trees (for example, [39]). Techniques for handling

large data on a single node include transformation and sorting (SPRINT, [40]),

sorting and using data appropriate data structures (SLIQ, [41]), sampling [38],

and approximation (SPEC, [42], with or without parallelization). Direct use of

parallelization has been demonstrated, for example, in [43].

Many classification algorithms build a functional mapping from input patterns

to output labels, optimizing an objective function that aims to minimize the training

error.

A simple linear mapping can be performed using the Moore-Penrose pseudo-

inverse, which links the input patterns to the labels via the weighted sum of the

input patterns to optimize the mean square error. If the training patterns are repre-

sented in a matrix of size N

D where D is the input dimension and N the number

of examples, and the corresponding labels in a N

×

1 vector T

∈{−

1

, +

1

}

, the op-

timal weights for the classifier are given by

P T

P − 1

P T

·

w

=

T

(7.1)

and the classification rule is

High-Throughput Image Reconstruction and Analysis

Search WWH ::

Custom Search

Home