Information Technology Reference
In-Depth Information
1) = P ( w T x + w 0
P 1 = P ( Z =1 ,T =
0 ,T =
1) = q (1
F Y |− 1 (0)) ,
(4.53)
1 ,T =1)= P ( w T x + w 0
P 1 = P ( Z =
0 ,T =1)= pF Y | 1 (0) ,
(4.54)
where F Y |t (0) = P ( Y
T = t ) is the conditional distribution value at the
origin of the univariate r.v. Y = w T x + w 0 .
A direct generalization of Theorem 4.2 can now be stated [212]:
0
|
Theorem 4.3. In the two-class multivariate problem, if the optimal set of
parameters given by w =[ w 1 ... w d w 0 ] T of a separating hyperplane con-
stitute a critical point of the error entropy, then the error probabilities of each
class at w are equal.
Proof. We start by noticing that the multivariate classification problem can
be viewed has a univariate one using u = w T x , the projection of x onto w .
From an initial input (overall) distribution represented by a density f X ( x )=
qf X|− 1 ( x )+ pf X| 1 ( x ) we get, on the projected space, the distribution of the
projected data given by f U ( u )= qf U|− 1 ( u )+ pf U| 1 ( u ). The parameter w 0 then
works as a Stoller split: a data instance is classified as ω 1 if u
w 0 and as
ω 1 otherwise. From Theorem 4.1, one can assert that qf U|− 1 ( u )= pf U| 1 ( u )
at w .
We rewrite the error probabilities of each class as
P 1 = q (1
F U|− 1 (
w 0 )) ,
P 1 = pF U| 1 (
w 0 ) ,
(4.55)
and compute
∂P 1
∂w 0
∂P 1
∂w 0
=
qf U|− 1 (
w 0 ) ,
= pf U| 1 (
w 0 ) .
(4.56)
From (4.2),
=ln 1
,
∂H S
∂P t
P 1
P 1
t
∈{−
1 , 1
}
,
P t
the chain rule and the fact that qf U|− 1 = pf U| 1 at w allows writting
∂H S
∂w 0 ( w )=0
(4.57)
pf U| 1 ( w 0 ) ln 1
ln 1
=0
P 1
P 1
P 1
P 1
(4.58)
P 1
P 1
f U| 1 ( w 0 )=0
P 1 = P 1 .
(4.59)
Note that f U| 1 ( w 0 )=0iff the classes have distributions with disjoint supports
(they are separable). But in this case P 1 = P 1 =0. Thus, in both cases
P 1 = P 1 is a necessary condition.
 
Search WWH ::




Custom Search