Database Reference
In-Depth Information
12.1 The Machine-Learning Model
In this brief section we introduce the framework for machine-learning algorithms and give
the basic definitions.
12.1.1
Training Sets
The data to which a machine-learning (often abbreviated ML) algorithm is applied is called
a training set. A training set consists of a set of pairs ( x , y ), called training examples , where
x is a vector of values, often called a feature vector . Each value, or feature, can
be categorical (values are taken from a set of discrete values, such as {red, blue,
green}) or numerical (values are integers or real numbers).
y is the label , the classification value for x .
The objective of the ML process is to discover a function y = f ( x ) that best predicts the
value of y associated with each value of x . The type of y is in principle arbitrary, but there
are several common and important cases.
(1) y is a real number. In this case, the ML problem is called regression .
(2) y is a boolean value true-or-false, more commonly written as +1 and −1, respectively.
In this class the problem is binary classification .
(3) y is a member of some finite set. The members of this set can be thought of as
“classes,” and each member represents one class. The problem is multiclass classific-
ation .
(4) y is a member of some potentially infinite set, for example, a parse tree for x , which is
interpreted as a sentence.
12.1.2
Some Illustrative Examples
EXAMPLE 12.1 Recall Fig. 7.1 , repeated as Fig. 12.1 , where we plotted the height and
weight of dogs in three classes: Beagles, Chihuahuas, and Dachshunds. We can think of
this data as a training set, provided the data includes the variety of the dog along with each
height-weight pair. Each pair ( x , y ) in the training set consists of a feature vector x of the
form [height, weight]. The associated label y is the variety of the dog. An example of a
training-set pair would be ([5 inches, 2 pounds], Chihuahua).
Search WWH ::




Custom Search