Environmental Engineering Reference
In-Depth Information
data are plotted in dark gray; estimated density is
superimposed in blue.(b) Log Mean Integrated
Squared Error (MISE) of fastICA (see hyvarinen
1999) and CICA model applied to mixed Gauss-
ian and Laplacian sources (via Gumbel-Hougard
copula) - S 1and S 2 above. The MISE is
CICA then, via the Gaussian copula, is a gen-
eralization of PCA type procedures where Θ is
a more general non-linear space or `dependency
gradient'.
Lastly, note under ordinary ICA—any trans-
form of the margins is arbitrary and identifying
the contrast gradient from the entropy is difficult.
In CICA, via equation (9), the second term on the
RHS is identifiable from the first term—allowing
for Gaussianity in the sources.
The advantages of this approach over non-
parametric component analysis models are:
N
1
2
n
( (
q y
)
ˆ (
q y
)) as N ranges from 10
n
n
n
n
=
1
to 10000. MISE for CICA is in blue, fastICA is
in red. The y −axis is plotted on log scale to
highlight the difference: the distance between the
two curves is on the order O n
−1 5 . The CICA
procedure has a marginally better error rate, and
less variability over (100) random draws at each
sample size. The mean MISE curves are plotted
in darker color.
(
/
)
1. Flexible choice of non-linear transformations
u .
2. Superior convergence of parametric estima-
tors and stability of parametric estimators on
small datasets.
3. Specification, 'tuneability' and interpret-
ability of dependency.
Unification of PCA/ICA via
the Gaussian Copula
CICA, or ICA via the copula, yields a unifying
framework in which PCA procedures can be cast.
In the particular case of elliptical dependence we
can write the density of the copula as
The main drawback of this method, especially
on high dimension data, is the computational
difficulty of the score maximization, equations
(13) through (15). This full algorithm requires
a non-linear optimization procedure on the full
dimension of the data simultaneously.
1
2
T
1
dC
u
=
φ
u u
Σ
=
φ
t
(16)
( )
(
)
( )
Θ
The CICA Algorithm: Partite
Model, Determinants of the ESI
with Θ = the 'scatter' matrix for multivariate
x k , and where φ ( ) ~ ( )
T o t 2 . The Gaussian
copula is a member of this family. In the full CICA
procedure we minimize the expected log of the
above via equation (11) for any copula expressed
'dependency gradient'.
It is direct to note that the PCA program is
a special case—the copula density matches the
above, i.e. is Gaussian or elliptical—where the
marginal mismatch (equation (9)) is ignored.
Alternately, note that PCA via singular value de-
composition (SVD) is a quadratic optimization,
consonant with the expression of the elliptical
copula density.
Essentially, the full method is simultaneously
minimizing the mutual information and marginal
fits. In the fully parameterized setting, the joint
entropy of the outputs is
k
i
H
u
=
H
u
I
u
(17)
( )
( )
( )
=
1
and the full method is equivalent to the maximi-
k
1
zation of the above equation. Notice that H
i
( )
u
=
Search WWH ::




Custom Search