Portable Wireless Device for Automated Agitation Detection - E-Healthcare Systems and Wireless Communications

Information Technology Reference

In-Depth Information

By definition, the VC dimension is the capacity of

the learning algorithm to shatter points in the input

space (Vapnik, 1998). Formally it is the cardinality

of the largest set of points that an algorithm can

shatter. The importance of the VC dimension is

that it appears explicitly in the bound on the total

error of an algorithm. The total error of the learning

machine is the sum of the training error (empirical

error) and the testing error (generalization error):

where D is the minimum radius of the sphere that

contains all the training points and ||w|| is the norm

of the weight vector that SVM is minimizing. ||w||

is given by (Vapnik, 1998):

∑

a i

This bound is important in two ways: it is easy

to compute, and Burges has shown that the true

VC is closely related to this bound. In particular he

showed that, most of the time, the true minimum

of the VC dimension is obtained when this bound

is minimal (Burges, 1998).

Notation. The following notations will be used

in the definitions and propositions that follow:

ε = ε

emp

where ε emp is the training error, and ε g is the gen-

eralization error (Vapnik, 1998). e emp can be made

arbitrarily small by choosing a machine with a VC

dimension at least equal to the number of training

points. In order to decrease the training error, one

has to increase the VC dimension. If a machine can

split all the training points without any errors, it

will have ε emp = 0. Vapnik has established a bound

on the testing error given by:

X = {x 1 , x 2 ,...x n } is the set of training vectors where

x i

R m and Y = {y 1 , y 2 ,...y n } is their corresponding

class where y i

∈

{-1, 1}.

∈

S(X, Y) is any learning algorithm trained by

training vectors X with class Y, and where all the

parameters of S are set.

ln n

VC ln

[

(

)

+ −

( )]

ε g

D is the radius of the smallest sphere that englobes

all the training points, and w is the weight vector

of the hyperplane.

where l is the number of training data, VC is the

VC dimension (referred to as h in some refer-

ences), and 1 - η is the probability for which this

last equation holds. This inequality shows that

the error is bounded by an increasing function of

the VC dimension, and thus a trade-off should be

made between the empirical error and the gener-

alization error. Although it is extremely difficult

and sometimes impossible to compute the VC

dimension of a certain algorithm, a bound on the

VC dimension has been established and will be

very useful in building the confidence measure.

Vapnik states that a bound on the VC dimension

is given by:

z is an unlabeled vector and d(z)

{-1, 1} is its

∈

prediction.

X z is the set of training points X to which z is ap-

pended, X z = {X, z}.

Y -1 is the set of labels Y to which (-1) is appended:

Y -1 = {Y, -1}, and Y 1 is the set of labels Y to which

(1) is appended: Y 1 = {Y, 1}.

Based on the notation presented, X z represents

a new training set that contains X to which z is

added. Y -1 is the set of labels formed by Y to which

(-1) is added. This means that we are assuming

2 2

w D

E-Healthcare Systems and Wireless Communications

Search WWH ::

Custom Search

Home