Information Technology Reference
In-Depth Information
and
w T
w 0 , w T Σ X|t w ) .
f E|t ( e )= f Y |t ( t
e )= g ( e ; t
μ X|t
(3.31)
We now proceed to compute the information potential as in Example 3.1:
+
f E|t ( e ) de = g (0; 0 , 2 σ Y |t ) and
−∞
+
d, 2 σ m ) ,
f E|− 1 ( e ) f E| 1 ( e ) de = g (0; 2
(3.32)
−∞
with σ Y |t = w T Σ X|t w m = σ Y |− 1 + σ Y | 1 ,and d = w T (
μ X| 1 μ X|− 1 ).
Hence, for equal priors:
V R 2
V R 2 ( d, σ Y |− 1 Y | 1 )=
1
8 π
1
σ Y |− 1 +
σ m exp
.
d ) 2
4 σ 2 m
1
σ Y |− 1 +
2
(2
(3.33)
It is clear that Rényi's quadratic entropy doesn't depend on w 0 .Thisisa
direct consequence of the invariance of entropy to translations, since from
(3.31) we observe that
w T
μ X|− 1 w T
w 0 . Shannon's entropy
and α -order Rényi's entropies are insensitive to the constant w 0 term.
Things are different, however, when a linear classifier is trained with gradi-
ent descent using empirical entropies. Off the convergent solution, the e i
E
[ E ]=
μ X| 1
e j
deviations in formula (3.3) are scattered, and the estimate f ( e ) doesn't usu-
ally reproduce well a sum of Gaussians with the above mean value. As a
consequence, the bias term of the solution will undergo adjustments. Near
the convergent solution, with the e i
e j deviations crowding a small interval,
the f ( e ) estimate then provides a close approximation of the theoretical error
PDF and the insensitivity to bias adjustments plays its role.
This empirical MEE behavior is illustrated in the following bivariate two-
class example, where Shannon's entropy gradient descent is used.
Example 3.3. Consider two normally distributed class-conditional PDFs, g ( x ;
μ t , Σ t ),with
μ 1 =[0 0] T ,
μ 1 =[2 0] T , Σ 1 = Σ 1 = I .
Independent training and test datasets with n = 250-instances (125 instances
per class) were generated and the Shannon MEE algorithm applied with
h =1and η =0 . 001 2 . Note that according to formula (E.19) the optimal
bandwidth for the number of instances being used is h IMSE =0 . 4.Weare,
therefore, using fat estimation of the error PDF.
2 From now on the indicated η values are initial values of an adaptive rule to be
described in Sect. 6.1.1.
 
Search WWH ::




Custom Search