Database Reference
In-Depth Information
An appropriate way to implement the decision function
f
would be to imagine two lines,
ates Beagles from Chihuahuas and Dachshunds. The vertical line represents a weight of 3
pounds and separates Chihuahuas from Beagles and Dachshunds. The algorithm that im-
plements
f
is:
if (height > 7) print Beagle
else if (weight < 3) print Chihuahua
else print Dachshund;
Recall that the original intent of
Fig. 7.1
was to cluster points without knowing which
variety of dog they represented. That is, the label associated with a given height-weight
vector was not available. Here, we are performing supervised learning with the same data
augmented by classifications for the training data.
□
EXAMPLE
12.2 As an example of supervised learning, the four points (0, 2), (2, 1), (3, 4),
where the vectors are one-dimensional. That is, the point (1, 2) can be thought of as a pair
([1], 2), where [
1
] is the one-dimensional feature vector
x
, and 2 is the associated label
y
;
the other points can be interpreted similarly.
Suppose we want to “learn” the linear function
f
(
x
) =
ax
+
b
that best rep resents the
points of the training set. A natural interpretation of “best” is that the RMSE of the value
of
f
(
x
) compared with the given value of
y
is minimized. That is, we want to minimize