Machine Learning Techniques for Large Data - High-Throughput Image Reconstruction and Analysis

Biomedical Engineering Reference

In-Depth Information

CHAPTER 7

Machine Learning Techniques for

Large Data

Elad Yom-Tov

The field of machine learning is devoted to the study and development of algo-

rithms that attempt to learn from data; that is, they extract rules and patterns from

data provided to them. In this chapter we survey different types of machine learn-

ing algorithms, with a focus on algorithms for pattern classification and analysis.

We further emphasize algorithms that are useful for processing large data, such as

that prevalent in computational biology.

7.1 Introduction

Machine learning (ML) methods are algorithms for identifying patterns in large

data collections and for applying actions based on these patterns. ML methods are

data driven in the sense that they extract rules from given data. ML algorithms are

being successfully applied in many domains, including analysis of Internet data,

fraud detection, bioinformatics, computer networks analysis, and information

retrieval.

Watanabe [1] described a pattern as ''the opposite of chaos; it is an entity,

vaguely defined, that could be given a name.'' Examples of patterns are DNA

sequences that may cause a certain disease, human faces, and behavior patterns.

A pattern is described by its features. These are the characteristics of the examples

for a given problem. For example, in a bioinformatics task, features could be the

genetic markers that are active or nonactive for a given disease.

Once data is collected, ML methods are usually applied in two stages: training

and testing. During the training phase, data is given to the learning algorithms

which then construct a model of the data. The testing phase consists of gen-

erating predictions for new data according to the model. For most algorithms,

the training phase is the computationally intensive phase of learning. Testing (or

applying) the model

is generally orders of magnitude faster than the learning

phase.

ML algorithms are commonly divided according to the data they use and

the model they build. For example, generative algorithms construct a model for

generating random samples of the observed data, usually assuming some hidden

parameters. In contrast, discriminative methods build a model for differentiating

between examples of different labels, where all parameters of the model are directly

measurable. In this chapter, we focus on discriminative algorithms.

Discriminative algorithms are often used to classify data according to labeled

examples of previous data. Therefore, the training data is provided with both inputs

161

Search WWH ::

Custom Search

Home