Large-Scale Machine Learning - Mining of Massive Datasets

Database Reference

In-Depth Information

decays with distance. A popular choice is to use a normal distribution (or “bell curve”), so

the weight of a training point x when the query is q is e − ( x−q ) 2 / σ 2 . Here σ is the standard de-

viation of the distribution and the query q is the mean. Roughly, points within distance σ of

q are heavily weighted, and those further away have little weight. The advantage of using

a kernel function that is itself continuous and that is defined for all points in the training

set is to be sure that the resulting function learned from the data is itself continuous (see

Exercise 12.4.6 for a discussion of the problem when a simpler weighting is used).

EXAMPLE 12.13 Let us use the seven training examples of Example 12.12 . To make cal-

culation simpler, we shall not use the normal distribution as the kernel function, but rather

another continuous function of distance, namely w = 1/( x − q ) 2 . That is, weights decay as

the square of the distance. Suppose the query q is 3.5. The weights w 1 , w 2 , . . . w 7 of the

seven training examples ( x i , y i ) = ( i , 8/2 |i− 4| ) for i = 1, 2, . . . , 7 are shown in Fig. 12.24 .

Figure 12.24 Weights of points when the query is q = 3.5

Lines (1) and (2) of Fig. 12.24 give the seven training points. The weight of each when

the query is q = 3.5 is given in line (3). For instance, for x 1 = 1, the weight w 1 = 1/(1 − 3.5) 2

= 1/(−2.5) 2 = 4/25. Then, line (4) shows each y i weighted by the weight from line (3). For

instance, the column for x 2 has value 8/9 because w 2 y 2 = 2 × (4/9).

Problems in the Limit for Example 12.13

Suppose q is exactly equal to one of the training examples x . If we use the normal distribution as the kernel function,

there is no problem with the weight of x : it is 1. However, with the kernel function discussed in Example 12.13 , the

weight of x is 1/( x − q ) 2 = ∞. Fortunately, this weight appears in both the numerator and denominator of the expres-

sion that estimates the label of q . It can be shown that in the limit as q approaches x , the label of x dominates all the

other terms in both numerator and denominator, so the estimated label of q is the same as the label of x . That makes

excellent sense, since q = x in the limit.

To compute the label for the query q = 3.5 we sum the weighted values of the labels in

the training set, as given by line (4) of Fig. 12.24 ; this sum is 51.23. We then divide by the

sum of the weights in line (3). This sum is 9.29, so the ratio is 51.23/9.29 = 5.51. That es-

timate of the value of the label for q = 3.5 seems intuitively reasonable, since q lies midway

between two points with labels 4 and 8.

□

Search WWH ::

Custom Search

Home