Machine Learning Techniques for Large Data - High-Throughput Image Reconstruction and Analysis

Biomedical Engineering Reference

In-Depth Information

lihood estimates of the distribution parameters are estimated by maximizing the

expected likelihood of the examples found in the expectation. The algorithm be-

gins with random values of the distribution parameters and iterates between the

E and M steps until it converges. The most demanding computational part of the

algorithm is the expectation step, where each sample in the data must be accessed.

However, this is also a task that is trivial to parallelize; each node in a parallel

system can observe some of the data and compute the statistics independently of

other nodes. The total update can then be aggregated at a central location (a mas-

ter node) and the M step executed. Alternatively, stochastic methods have been

suggested for online training of the EM algorithm [31].

When the underlying distribution of data cannot be modeled using paramet-

ric distributions, or when these distributions are unknown, one is forced to use

other classification approaches. One of these is to model the density of data for

each of the classes and classifying test points according to the maximum posterior

probability. Frequently, this is done using the Parzen windows estimator [32]. This

method is, however, expensive both computationally and memory-wise. Addition-

ally, a large number of training points is required for an accurate estimation of

density. Therefore, this method is suitable mainly for small to medium datasets.

A different method for classification uses a naive Bayes classifier. This classifier

works under the assumption that each feature is independent of other features.

Under this assumption, using Bayes' theorem, the probability that a sample drawn

from a given class is simply the prior probability of this class multiplied by the

probability that the value of the sample for each feature is derived from the class.

The probabilities for each feature are approximated with relative frequencies found

in the training set.

In spite of their naive design and underlying assumptions, naive Bayes classifiers

often work well on real-world problems. This has been shown to have a theoretic

justification [33].

Speeding up a naive Bayes classifier (i.e., computing the frequencies of feature

values) can be done by either sending different features to different processing

nodes or by partitioning the data to different nodes and aggregating the frequencies

returned from each node.

Most classification algorithms directly try to construct a boundary between

classes to minimize the test error. The k-nearest neighbor classifier is probably the

simplest classification method. This classifier labels test points according to a ma-

jority vote amongst the k closest training points. This method frequently achieves

remarkably low classification errors but it suffers from two main drawbacks. The

first is that it is computationally expensive because of the need to find the k closest

training points to each test point. The second drawback is that a large memory is

required for storing the training data. One way to overcome the first drawback is

through efficient data structures such as the KD-tree [34] or, more recently, cover

trees [35].

A different approach to classification uses decision trees. In a decision tree,

each node corresponds to a variable and each arc to a range of possible values of

that variable. The leaves of the tree represent predictions for values of the labels.

Decision trees are built to create leaves that are as pure as possible; that is, they

contain the largest possible fraction of points from a single class. Classification of

High-Throughput Image Reconstruction and Analysis

Search WWH ::

Custom Search

Home