Information Technology Reference
In-Depth Information
Therefore, the probability of the classifier having generated class
j
is given by
w
j
,whichisthe
j
th element of its parameter vector
w
D
Y
.
∈
R
5.5.1 A Quality Measure for Classification
Good classifiers are certain about which classes they are associated with. This
implies that one aims at finding classifiers that have a high probability associated
with a single class, and low probability for all other classes.
For a two-class problem, the relation
w
2
=1
−
w
1
is required to hold to satisfy
j
w
j
= 1. In such a case, the model's variance var(
y
|
w
)=
w
1
(1
−
w
1
) is a good
measure of the model's quality as it is var(
y
|
w
)=0for
w
1
=0or
w
2
=0,and
has its maximum var(
y
|
w
)=0
.
25 at
w
1
=0
.
5, which is the point of maximum
uncertainty.
The same principle can be extended to multi-class problems, by taking the
product of the elements of
w
, denoted
τ
−
1
,andgivenby
D
Y
τ
−
1
=
w
j
.
(5.78)
i
=1
In the three-class case, for example, the worst performance occurs at
w
1
=
w
2
=
w
3
=1
/
3, at which point
τ
−
1
is maximised. Note that
τ
−
1
is, unlike for linear
regression, formally
not
the precision estimate.
As
τ
−
1
is easily computed from
w
, its estimate does not need to be maintained
separately. Thus, the description of batch and incremental learning approaches
deals exclusively with the estimation of
w
.
5.5.2 Batch Approach for Classification
Recall that the aim of a classifier is to solve (4.24), which, together with (5.77)
results in the constrained optimisation problem
N
D
Y
max
w
m
(
x
n
)
y
nj
ln
w
j
,
(5.79)
n
=1
j
=1
D
Y
subject to
w
j
=1
.
j
=1
−
j
w
j
=0,the
Using the Lagrange multiplier
λ
to express the constraint 1
aim becomes to maximise
N
⎛
⎞
D
Y
D
Y
⎝
1
⎠
.
m
(
x
n
)
y
nj
ln
w
j
+
λ
−
w
j
(5.80)
n
=1
j
=1
j
=1
Differentiating the above with respect to
w
j
for some
j
, setting it to 0, and
solving for
w
j
results in the estimate
N
w
j
=
λ
−
1
m
(
x
n
)
y
nj
.
(5.81)
n
=1
Search WWH ::
Custom Search