Information Technology Reference
In-Depth Information
Substituting (3.55) and (3.56) in (3.53) we finally obtain the expression (3.46)
of V R 2 ( E ).
Remarks:
1. Note that the supports of f E|− 1 ( e ) and f E| 1 ( e ) are disjoint; respectively,
]
π, 0[ and ]0 [. This is a consequence of using a sigmoidal activation
function, and contrasts to what happened with the linear discriminant.
2. The expression of V R 2 ( E ) has terms m t containing the bias w 0 . Therefore,
and also contrasting with the linear discriminant, the MEE algorithm is
now always able to adjust the bias of the decision borders. The same
remark applies to EE for formulas (3.44) and (3.45).
It is a well-known fact that, for two-class problems with Gaussian inputs
having equal covariance matrix Σ , the Bayes linear decision function for
equal priors [182, 76] is
1
2 (
μ 1 )] T Σ 1 (
d ( x )=[ x
μ 1 +
μ 1 μ 1 );
(3.57)
in other words, the linear discriminant d ( x )=0passes through the point
lying half-way of the means — 2 (
μ 1 ) — and is orthogonal to Σ 1 (
μ 1
μ 1 ). We now analyze how the arctan perceptron behaves relatively to this
issue in the next theorem.
μ 1 +
Theorem 3.3. The Bayes linear discriminant for equal-prior two-class prob-
lems, with Gaussian inputs having the same covariance Σ , is a critical point
of Rényi's quadratic entropy of the arctan perceptron.
Proof. We first note that, given a two-class problem with Gaussian inputs
S having the same covariance Σ , one can always transform it into an equiv-
alent problem with covariance I , by applying the whitening transformation
to the inputs (see e.g. [76]): x =( ΦΛ 1 / 2 ) T s ,where s is a (vector) instance
of the multi-dimensional r.v. S , Φ and Λ are the eigenvector and eigen-
value matrices of Σ , respectively, and x is the corresponding instance of the
whitened multi-dimensional r.v. X . Since the whitening transformation is a
linear transformation, the position of the critical points of a continuous func-
tion of S , like the one implemented by the arctan perceptron, is also linearly
transformed by ( ΦΛ 1 / 2 ) T . For the whitened inputs the Bayes discriminant
is orthogonal to the vector linking the means
μ 1 μ 1 .
We then apply the previous Lemma to the whitened inputs problem, and
proceed to studying the critical points of V ( w ,w 0 )
8 πV R 2 ( E ; w ,w 0 ):
1
=
V ( w ,w 0 )=
t
σ (1 + m t )+ σ
1
σ (1 + m 2
1 + m 1 )+ σ, (3.58)
V
2
with σ = w T Iw =
w
. Let us take the partial derivatives. For the bias
term:
 
Search WWH ::




Custom Search