Information Technology Reference
In-Depth Information
The Lagrangian of the above optimization problem is
α i y i w T x i +
b
ξ i .
1
2
2
L =
w
1
+
i
By setting the derivative of the Lagrangian to be zero, the optimization problem
can be written in its dual form (in terms of α i ),
max
i
1
2 α i α j y i y j x i
α i
x j ,
s.t.
α i y i =
0 , i
0 .
i
This is a typical quadratic programming problem. After solving it, we can recover
the ranking model w = i α i y i x i .
If we use the Karush-Kuhn-Tuker (KTT) conditions to analyze the optimality
of the solutions to the above optimization problem, we will come to the following
conclusion:
y i w T x i + b
α i =
0
1 ,
y i w T x i +
b =
0 i <C
1 ,
y i w T x i +
b
α i =
C
1 .
That is, the hyperplane w is the linear combination of a small number of samples.
These samples x i with non-zero α i are called support vectors. This is why we have
the name “support vector machines” for the algorithm.
Note that we have explained support vector machines using the linear model as
an example. One can replace the inner product x i x j in the above discussions with
a kernel function K(x i ,x j ) to extend the algorithm to the non-linear case. Typical
kernel functions include the polynomial kernel and the RBF kernel. Kernel functions
can also be regarded as similarity measures between input objects, just as the inner
product. The well-known Mercer's condition states that any positive semi-definite
kernel K(
) , i.e., i,j K(x i ,x j )c i c j
0, can be expressed as an inner product
in a high dimensional space. Therefore, by using the kernel function, we actually
significantly increase the dimensionality of the input space, however, at the same
time, keep the computational complexity almost unchanged.
Support vector machines have been applied in many applications, and have
achieved great success. This is in part because the algorithm is very simple and
clear, and in part because the algorithm has its theoretical guarantee on the gen-
eralization ability. The generalization ability is related to the concept of the VC
dimension, which will be explained more in the next section.
·
,
·
Search WWH ::




Custom Search