Information Technology Reference
In-Depth Information
According to the data, the performance of SVMs is similar or even superior to
that of a neural network or a Gaussian mixture model. They directly implement
the principle of structural risk minimization [15] and work by mapping the train-
ing points into a high dimensional feature space, where a separating hyperplane
( w, b ) is found by maximizing the distance from the closest data points (boundary-
optimization). Given a set of training samples S =
{
( x i ,y i )
|
i =1 , .., m
}
,where
x i
1 are class labels for a 2-class problem,
SVMs attempt to find a classifier h ( x ), which minimizes the expected misclassi-
fication rate. A linear classifier h ( x ) is a hyperplane, and can be represented as
h ( x )= sign ( w T x + b ). The optimal SVM classifier can then be found by solving
a convex quadratic optimization problem:
R n are input patterns, y i
+1 ,
2 + C i =1 ξ i subject to
1
max
2
w
w,b
(8)
y i (
w, x i
+ b )
1
ξ i and ξ i
0
Where b is the bias, w is weight vector, and C is the regularization parame-
ter, used to balance the classifier's complexity and classification accuracy on the
training set S . Simply replacing the involved vector inner-product with a non-
linear kernel function converts linear SVM into a more flexible non-linear clas-
sifier, which is the essence of the famous kernel trick. In this case, the quadratic
problem is generally solved through its dual formulation:
L ( w, b, α )= i =1 α i
2 ( i =1 y i y j α i α j K ( x i ,x j ))
1
(9)
0 and i =1 y i α i y i =0
subject to C
α i
where a i are the coecients that are maximized by Lagrangian. For training
samples x i , for which the functional margin is one (and hence lie closest to the
hyperplane), α i
0. Only these instances are involved in the weight vector,
and hence are called the support vectors [12]. The non-linear SVM classification
function (optimum separating hyperplane) is then formulated in terms of these
kernels as:
h ( x )= sign m
b
α i y i K ( x i ,x j )
(10)
i =1
4.2 M-SVM Classifiers
M-SVM is based on Learn++ algorithm. This latter, generates a number of
weak classifiers from a data set with known label. Depending on the errors of
the classifier generated low, the algorithm modifies the distribution of elements
in the subset according to strengthen the presence of the most dicult to classify.
This procedure is then repeated with a different set of data from the same dataset
and new classifiers are generated. By combining their outputs according to the
scheme of majority voting Littlestone we obtain the final classification rule.
The weak classifiers are classifiers that provide a rough estimate - about 50%
or more correct classification - a rule of decision because they must be very
quick to generate. A strong classifier from the majority of his time training to
refine his decision criteria. Finding a weak classifier is not a trivial problem
 
Search WWH ::




Custom Search