Database Reference
In-Depth Information
EXAMPLE 12.9 Figure 12.18 shows six points, three positive and three negative. We expect
that the best separating line will be horizontal, and the only question is whether or not the
separating hyperplane and the scale of w allows the point (2, 2) to be misclassified or to
lie too close to the boundary. Initially, we shall choose w = [0, 1], a vertical vector with a
scale of 1, and we shall choose b = −2. As a result, we see in Fig. 12.18 that the point (2,
2) lies on the initial hyperplane and the three negative points are right at the margin. The
parameter values we shall choose for gradient descent are C = 0.1, and η = 0.2.
Figure 12.18 Six points for a gradient-descent example
We begin by incorporating b as the third component of w , and for notational convenien-
ce, we shall use u and v as the first two components, rather than the customary w 1 and w 2 .
That is, we take w = [ u , v , b ]. We also expand the two-dimensional points of the training
set with a third component that is always 1. That is, the training set becomes
([1, 4, 1], +1) ([2, 2, 1], +1) ([3, 4, 1], +1)
([1, 1, 1], −1) ([2, 1, 1], −1) ([3, 1, 1], −1)
In Fig. 12.19 we tabulate the if-then conditions and the resulting contributions to the
summations over i in Equation 12.6 . The summation must be multiplied by C and added to
u , v , or b , as appropriate, to implement Equation 12.6 .
Figure 12.19 Sum each of these terms and multiply by C to get the contribution of bad points to the derivatives of f with
respect to u , v , and b
The truth or falsehood of each of the six conditions in Fig. 12.19 determines the contri-
bution of the terms in the summations over i in Equation 12.6 . We shall represent the status
of each condition by a sequence of x's and o's, with x representing a condition that does
not hold and o representing one that does. The first few iterations of gradient descent are
shown in Fig. 12.20 .
Search WWH ::




Custom Search