Information Technology Reference
In-Depth Information
ers can also be differentiated with respect to the nature of the data. Most
standard classi
Classi
ers are applied to categorise objects. This typically amounts to
representing objects as points in an appropriately dimensional feature space, which
typically is a Euclidean vector space. However, it is also possible to use classifi-
-
cation approach to analyse processes rather than objects. Data from such processes
typically come in the form of time series.
Although it is possible to unfold time series in time and use the highly
dimensional vector space representation and hence static classi
ers, there have been
also a number of dynamic classi
cation techniques developed to deal with temporal
dynamics of processes.
Another way to categorise classi
ers considers the nature of the decision
boundary they can construct. Linear classi
ers construct hyperplanes as their
decision boundaries, whereas nonlinear classi
ers can generate more
flexible
hypersurfaces in order to achieve class separation.
Once the decision boundary is constructed, using explicit or implicit class
membership information, the unseen data can be fed into classi
er which will then
suggest the most likely class for a given datum (i.e. on which side of decision
boundary it falls).
The subsequent classi
cation performance will depend on the nature of the
decision boundary a given classi
er architecture can realise, on the properties of the
training set and how the training process was conducted. Any
finite training set will
by necessity contain only limited information about the data class structure. This is
because of the inherent error and incompleteness of the representation of classi
ed
objects in the feature space, as well as
finite data size. Moreover, the dataset may
actually contain some peculiar characteristics which is speci
finite
sample rather than representative of the entire class. Thus, the training of the
classi
c to only this
er must be monitored in order to arrive at a decision boundary that represents
true class information rather than over
c dataset. At the same
time, how well the true decision boundary can be captured depends on the classi
tting to the speci
er
complexity. These con
icting constraints can be formalised within the framework
of mean square cost error where the latter can be shown to decompose into three
terms corresponding, respectively, to the noise, bias, and variance, where only the
bias (the constraints implied by a particular type of the decision boundary class
supported by the classi
er) and variance (the dependence on the training set used)
can be minimised. Unfortunately, the bias
variance trade-off implies it is actually
typically not possible to minimise both simultaneously; the increase of classi
-
er
flexibility from enlarging the class of the boundary decision functions it can rep-
resent typically makes it more prone to over
tting. On the other hand, more robust
classi
ers that are not sensitive to the peculiarities of the data tend to have low
complexity (highly constrained decision boundary form).
In order to strike the balance implied by the bias
variance trade-off dilemma, the
-
preparation of the classi
er is often performed in stages supported by splitting the
data into several sets, with the training subset used to adjust the classi
er param-
eters, validation set used to monitor classi
er progress or complexity, and testing
used for the
finally selected classi
er.
Search WWH ::




Custom Search