Biomedical Engineering Reference
In-Depth Information
CHAPTER 7
Machine Learning Techniques for
Large Data
Elad Yom-Tov
The field of machine learning is devoted to the study and development of algo-
rithms that attempt to learn from data; that is, they extract rules and patterns from
data provided to them. In this chapter we survey different types of machine learn-
ing algorithms, with a focus on algorithms for pattern classification and analysis.
We further emphasize algorithms that are useful for processing large data, such as
that prevalent in computational biology.
7.1 Introduction
Machine learning (ML) methods are algorithms for identifying patterns in large
data collections and for applying actions based on these patterns. ML methods are
data driven in the sense that they extract rules from given data. ML algorithms are
being successfully applied in many domains, including analysis of Internet data,
fraud detection, bioinformatics, computer networks analysis, and information
retrieval.
Watanabe [1] described a pattern as ''the opposite of chaos; it is an entity,
vaguely defined, that could be given a name.'' Examples of patterns are DNA
sequences that may cause a certain disease, human faces, and behavior patterns.
A pattern is described by its features. These are the characteristics of the examples
for a given problem. For example, in a bioinformatics task, features could be the
genetic markers that are active or nonactive for a given disease.
Once data is collected, ML methods are usually applied in two stages: training
and testing. During the training phase, data is given to the learning algorithms
which then construct a model of the data. The testing phase consists of gen-
erating predictions for new data according to the model. For most algorithms,
the training phase is the computationally intensive phase of learning. Testing (or
applying) the model
is generally orders of magnitude faster than the learning
phase.
ML algorithms are commonly divided according to the data they use and
the model they build. For example, generative algorithms construct a model for
generating random samples of the observed data, usually assuming some hidden
parameters. In contrast, discriminative methods build a model for differentiating
between examples of different labels, where all parameters of the model are directly
measurable. In this chapter, we focus on discriminative algorithms.
Discriminative algorithms are often used to classify data according to labeled
examples of previous data. Therefore, the training data is provided with both inputs
161
 
Search WWH ::




Custom Search