Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

then the second term of Equation 12.4 will always be 0, which simplifies your work

considerably.

! EXERCISE 12.3.3 The following training set obeys the rule that the positive examples all

have vectors whose components have an odd sum, while the sum is even for the negative

examples.

([1, 2], +1) ([3, 4], +1) ([5, 2], +1)

([2, 4], −1) ([3, 1], −1) ([7, 3], −1)

(a) Suggest a starting vector w and constant b that classifies at least three of the points

correctly.

!! (b) Starting with your answer to (a), use gradient descent to find the optimum w and b .

12.4 Learning from Nearest Neighbors

In this section we consider several examples of “learning,” where the entire training set is

stored, perhaps preprocessed in some useful way, and then used to classify future examples

or to compute the value of the label that is most likely associated with the example. The

feature vector of each training example is treated as a data point in some space. When a

new point arrives and must be classified, we find the training example or examples that are

closest to the new point, according to the distance measure for that space. The estimated

label is then computed by combining the closest examples in some way.

12.4.1

The Framework for Nearest-Neighbor Calculations

The training set is first preprocessed and stored. The decisions take place when a new ex-

ample, called the query example , arrives and must be classified.

There are several decisions we must make in order to design a nearest-neighbor-based

algorithm that will classify query examples. We enumerate them here:

(1) What distance measure do we use?

(2) How many of the nearest neighbors do we look at?

(3) How do we weight the nearest neighbors? Normally, we provide a function (the kernel

function ) of the distance between the query example and its nearest neighbors in the

training set, and use this function to weight the neighbors.

Search WWH ::

Custom Search

Home