Information Technology Reference
In-Depth Information
Therefore, the probability of the classifier having generated class j is given by
w j ,whichisthe j th element of its parameter vector w
D Y .
R
5.5.1 A Quality Measure for Classification
Good classifiers are certain about which classes they are associated with. This
implies that one aims at finding classifiers that have a high probability associated
with a single class, and low probability for all other classes.
For a two-class problem, the relation w 2 =1
w 1 is required to hold to satisfy
j w j = 1. In such a case, the model's variance var( y
|
w )= w 1 (1
w 1 ) is a good
measure of the model's quality as it is var( y
|
w )=0for w 1 =0or w 2 =0,and
has its maximum var( y
|
w )=0 . 25 at w 1 =0 . 5, which is the point of maximum
uncertainty.
The same principle can be extended to multi-class problems, by taking the
product of the elements of w , denoted τ 1 ,andgivenby
D Y
τ 1 =
w j .
(5.78)
i =1
In the three-class case, for example, the worst performance occurs at w 1 = w 2 =
w 3 =1 / 3, at which point τ 1
is maximised. Note that τ 1
is, unlike for linear
regression, formally not the precision estimate.
As τ 1 is easily computed from w , its estimate does not need to be maintained
separately. Thus, the description of batch and incremental learning approaches
deals exclusively with the estimation of w .
5.5.2 Batch Approach for Classification
Recall that the aim of a classifier is to solve (4.24), which, together with (5.77)
results in the constrained optimisation problem
N
D Y
max
w
m ( x n )
y nj ln w j ,
(5.79)
n =1
j =1
D Y
subject to
w j =1 .
j =1
j w j =0,the
Using the Lagrange multiplier λ to express the constraint 1
aim becomes to maximise
N
D Y
D Y
1
.
m ( x n )
y nj ln w j + λ
w j
(5.80)
n =1
j =1
j =1
Differentiating the above with respect to w j for some j , setting it to 0, and
solving for w j results in the estimate
N
w j = λ 1
m ( x n ) y nj .
(5.81)
n =1
Search WWH ::




Custom Search