Information Technology Reference
In-Depth Information
Remark 1. If the training set is not linearly separable in the feature space
determined by function Φ ( x ), the algorithm does not converge, and does not
give even an approximation to the solution.
Remark 2. The same algorithm of constrained quadratic minimization may
be used to find the maximal margin perceptron in input space, if the training
set is known to be linearly separable. Otherwise, one has to map the patterns
to a feature space of dimension high enough to guarantee linear separation.
If the hard margin constraints cannot be satisfied, two possibilities arise:
either another application ϕ is chosen, or the constraint that all examples
should be correctly classified and lie outside the margin are relaxed. In the
latter case, new variables ζ k , called slack variables, are introduced. Then the
function to be minimized is
w + C M
Γ ( C )= 1
( ζ k ) n ,
2 w
·
k =1
where C is a positive hyperparameter ( C> 0) to be chosen, and n a positive
exponent ( n> 0), under the constraints
y k Φ ( x k )
ζ k
·
w > 1
ζ k
0 .
The minimization of Γ ( C ) under the above two constraints defines the soft
margin SVM. Clearly, the slack variables allow the violation of the hard margin
constraints. If 0 k < 1, the distance of the examples to the hyperplane
is smaller than 1 /
, but they are correctly classified. Conversely, those
with ζ k > 1 are incorrectly classified. In order to minimize the number of
misclassifications, they must be penalized: that is the rationale for including
the second term in the cost function Γ ( C ). The larger the exponent n ,the
more penalized the misclassifications. However, in order to keep a quadratic
minimization problem, the possible values of n are restricted to n =1or
n = 2. In such conditions, the solution of the soft margin problem is still
unique, and can be found using algorithms of quadratic minimization under
constraints. The solution has the same form as the above expression of w , with
the examples with ζ k
w
= 0 included in the sum. In principle, the coe cient C
in Γ ( C ) is arbitrary. Its value controls the relative importance given to the
training errors with respect to the margin maximization. Theoretical studies
have shown that the value of C has a large influence in the generalization
properties of the soft margin SVM's. The choice of its value is a practical
problem, and several heuristics have been proposed for the applications of the
algorithm.
In practice there is no need to define explicitly the application ϕ : x
Φ ( x ). In fact, from the expression of w , classifying a new input x only requires
the scalar products Φ ( x )
Φ ( x k ) of the input features with those of the support
·
Search WWH ::




Custom Search