Information Technology Reference
In-Depth Information
Do risk functionals behave similarly with respect to the classifier problem,
i.e., to the min P e issue? The answer to this question is not easy even when
we restrict the classes of ( X, T ) distributions and the classifier families under
consideration. On one hand, none of the previously discussed risk functionals
provides, in general, the min P e solution (although they can achieve that in
particular cases); on the other hand, there is no theoretical evidence preclud-
ing the existence of a risk functional that would always provide the min P e
solution.
We have seen in Sects. 2.1.1 and 2.1.2 that both MMSE and MCE although
theoretically able to produce estimates of posterior probabilities (and there-
fore of attaining the optimal Bayes error, P e ( Z Bayes )),providedsomerestric-
tive conditions are satisfied, are unable to achieve that in practice mainly be-
cause at least some of the conditions — such as arbitrarily complex classifier
architecture or independence of the target components — are unrealistically
restrictive. But what really interestsusisnottheattainmentof P e ( Z Bayes ),
but of the minimum probability of error for some classifier function family,
Z W . This issue, as far as we know, has never been studied for MMSE and
MCE. We will present later in the topic some results on this issue for MEE.
Is it possible to conceive data classification problems where MCE and MEE
perform better than MMSE? And where MEE outperforms both MCE and
MMSE? The answer to these questions is armative, as we shall now show
with a simple example of a family of data classification problems, where for
an infinite subset of the family MEE provides the correct solution, whereas
MMSE and MCE do not [150, 219].
Example 2.7. Let us consider a family of two-class datasets in bivariate space
R
. We denote the input vectors by x =[ x 1 x 2 ] T ,
and assume the following marginal and independent PDFs:
2 , target space T =
{−
1 , 1
}
f 1 ( x 1 )= 1
2 [ u ( x 1 ; a, 1) + u ( x 1 ; b, a )] ,f 1 ( x 1 )= f 1 (
x 1 ) ,
(2.49)
f t ( x 2 )= u x 2 ;
,
c
2 , c
(2.50)
2
where u ( x ; a, b ) is the uniform density in [ a, b ]. We further assume a
[0 , 1[,
b<a , c> 0 and P (1) = P (
1) = 1 / 2.
Figure 2.10 shows a sample of 500 instances per class, random and indepen-
dently drawn from the above distribution family for a particular parameter
choice. We see, that for suitable choices of the parameters we can obtain
distributions combining a high concentration around
with long tails.
Let us assume a classifier implementing the thresholded linear family ϑ =
{−
1 , 1
}
h ( ϕ ( x )) = h ( w T x ); w R
2
{
) is the Heaviside function. The
decision surfaces are straight lines passing through the origin ( ϕ ( x ) > 0
assigns x to ω 1 ; otherwise to ω 1 ). The classifier problem consists of selecting
the straight line providing the min P e solution.
}
,where h (
·
 
Search WWH ::




Custom Search