Image Processing Reference
In-Depth Information
as the inverse square root of the covariance matrix C x
=
E { xx T }, which is written
as
C −12
/
.
Given the eigenvalue decomposition of the covariance matrix
C
=
E
T
(16.40)
x
with E the eigenvectors matrix and D the diagonal matrix of the eigenvectors,
the whitening matrix can be written as
WEDE
S
=
12
/
T
(16.41)
This whitening step is often performed in conjunction with the dimensionality
reduction operation by means of PCA, as we shall see later.
16.5.2.3
Information Maximization and Maximum
Likelihood Approaches
Another approach to find the independent components from observed mixtures is
the information maximization approach, named InfoMax , which consists in max-
imizing the joint output entropy of a neural network whose inputs are the observed
variables. The entropy of a variable can be seen as a measure of the information
that its observation gives: the more random the variable, the more information we
have from its observation, and so the higher its entropy. The outputs of the neural
network can be written as . Maximizing the joint entropy of the out-
puts of this neural network is found to be equivalent to minimizing the mutual
information among the estimated components . The mutual information
between a set of variables is an information theoretic criterion, which shows the
amount of information that the knowledge about a variable carries about the other.
The mutual information of a set of random variables y i can be written as
yg
i
=
(
wx
T
)
i
i
wx
i
T
n
Iy y
(, , , )
… =
y
Hy
()
H
()
y
(16.42)
12
n
i
i
=
1
where H (·) is the entropy. The first term is related to the amount of information
we get from the observation of the variables separately, whereas the second term
is related to the amount of information we get from the observation of all the
variables together. If the variables are statistically independent, we do not have
any additional information about any variable from the observation of any other,
and the entropy of the complete variable vector is the sum of the entropies of
the individual variables. In this case the mutual information equals zero. If there
is some redundancy in the variable set, it means that we can get some information
about some variable from the observation of the others, and the entropy of the
complete vector is lower than the sum of the individual entropies. This results
in a mutual information greater than zero: minimizing the mutual information
Search WWH ::




Custom Search