Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

12.1 The Machine-Learning Model

In this brief section we introduce the framework for machine-learning algorithms and give

the basic definitions.

12.1.1

Training Sets

The data to which a machine-learning (often abbreviated ML) algorithm is applied is called

a training set. A training set consists of a set of pairs ( x , y ), called training examples , where

• x is a vector of values, often called a feature vector . Each value, or feature, can

be categorical (values are taken from a set of discrete values, such as {red, blue,

green}) or numerical (values are integers or real numbers).

• y is the label , the classification value for x .

The objective of the ML process is to discover a function y = f ( x ) that best predicts the

value of y associated with each value of x . The type of y is in principle arbitrary, but there

are several common and important cases.

(1) y is a real number. In this case, the ML problem is called regression .

(2) y is a boolean value true-or-false, more commonly written as +1 and −1, respectively.

In this class the problem is binary classification .

(3) y is a member of some finite set. The members of this set can be thought of as

“classes,” and each member represents one class. The problem is multiclass classific-

ation .

(4) y is a member of some potentially infinite set, for example, a parse tree for x , which is

interpreted as a sentence.

12.1.2

Some Illustrative Examples

EXAMPLE 12.1 Recall Fig. 7.1 , repeated as Fig. 12.1 , where we plotted the height and

weight of dogs in three classes: Beagles, Chihuahuas, and Dachshunds. We can think of

this data as a training set, provided the data includes the variety of the dog along with each

height-weight pair. Each pair ( x , y ) in the training set consists of a feature vector x of the

form [height, weight]. The associated label y is the variety of the dog. An example of a

training-set pair would be ([5 inches, 2 pounds], Chihuahua).

Search WWH ::

Custom Search

Home