Machine Learning Techniques for Large Data - High-Throughput Image Reconstruction and Analysis

Biomedical Engineering Reference

In-Depth Information

Consequently numerous methods have been proposed for finding a (suboptimal)

solution by testing a fraction of the possible combinations.

Feature selection methods can be divided into three main types [4]:

1. Wrapper methods: The feature selection is performed around (and with)

a given classification algorithm. The classification algorithm is used for

ranking possible feature combinations.

2. Embedded methods: The feature selection is embedded within the classifi-

cation algorithm.

3. Filter methods: Features are selected for classification independently of the

classification algorithm.

The simplest wrapper algorithms are exhaustive search (which, as noted above

is practical only when the number of features is small), sequential forward feature

selection (SFFS), and sequential backward feature selection (SBFS). More advanced

wrapper algorithms include simulated annealing and genetic algorithms [5]. Se-

quential forward feature selection works by first selecting the feature with which

the lowest classification error is reached and then greedily adding features that

provide the largest reduction in error. This process is continued until the classifi-

cation error starts to increase. The underlying assumption of this method is that

features can be evaluated individually. While this assumption often works well in

practice, it may cause this method to miss combinations of features that are useful

only when selected together.

Because wrapper algorithms use a classification algorithm as the scoring func-

tion, it is easy to parallelize them if the classification algorithm can be parallelized.

Alternatively, methods such as SFFS and SBFS can be parallelized by exploiting

the fact that features are evaluated independently and thus, different features can

be evaluated by different processors.

One example of the filtering approach to feature selection is that developed

by Koller and Sahami [6]. Here, information theoretic considerations are used

to decide on the most informative features, without the need for a classification

algorithm. This method estimates the cross-entropy between every pair of features

and discards those features that have a large cross-entropy with other features, thus

removing features that add little additional classification information. However,

because the pair-wise estimate of the cross-entropy between features needs to be

computed, this algorithm scales quadratically with the number of features. Thus,

only data with a small to medium number of features can be evaluated using this

method. We note, however, that this method is easy to parallelize because of the

independence between the estimation procedure for each pair of features.

An altogether different approach to reducing dimensions is through feature

reduction, that is, finding a low-dimensional combination of features. This com-

bination can be linear or nonlinear.

Arguably the most well-known of these methods is principal component anal-

ysis [PCA, or Karhunen-Loeve Transform (KLT)]. PCA reshapes the data along

directions of maximal variance. This is done by finding the eigenvectors corre-

sponding to the largest eigenvalues of the covariance matrix of the data, and

projecting the data along these eigenvectors. Thus, the low-dimensional features

High-Throughput Image Reconstruction and Analysis

Search WWH ::

Custom Search

Home