Graphics Reference
In-Depth Information
θ
,μ,τ
where
is the parameter set. Since the maximum likelihood estimation of
the PPCA is identical to PCA, PPCA is a natural extension of PCA to a probabilistic
model.
We present here a Bayesian estimation method for PPCA from the authors.
Bayesian estimation obtains the posterior distribution of
W
θ
and X , according to the
Bayes' theorem:
p
(θ,
X
|
Y
)
p
(
Y
,
X
| θ)
p
(θ).
(4.24)
p
.
The prior distribution is a part of the model and must be defined before estimation.
We assume conjugate priors for
(θ)
is called a prior distribution, which denotes a priori preference for parameter
θ
τ
and
μ
, and a hierarchical prior for W , namely, the
K .
prior for W
,
p
(
W
| τ,α)
, is parameterized by a hyperparameter
α ∈ R
K
p
| α)
p
(μ,
W
| α) =
p
| τ)
p
(τ )
p
(
w j | τ,α j ),
j
=
1
) = N(μ | μ 0 ,(γ μ 0 ) 1 I m ),
p
|
tau
,(α j τ) 1 I m ),
p
(
w j | τ,α j ) = N(
w j |
0
p
(τ ) = G(τ | τ 0 τ 0 )
G(τ | τ,γ τ )
denotes a Gamma distribution with hyperparameters
τ
and
γ τ
:
exp
G(τ | τ,γ τ ) τ τ 1
) γ τ
Γ(γ τ )
γ τ τ 1
τ + τ
1
)
ln
τ
where
is a Gamma function.
The variables used in the above priors,
Γ( · )
τ 0 are deterministic
hyperparameters that define the prior. Th ei r actual val u es should be given before the
estimation. We set
γ μ 0 ,
μ 0 ,
γ τ 0 and
10 10 ,
γ μ 0 = γ τ 0 =
μ 0 =
0 and
τ 0 =
1, which corresponds to
an almost non-informative prior.
Assuming the priors and given a whole data set Y
=
y , the type-II maximum
α ML II and the posterior distribution of the parameter,
likelihood hyperparameter
q
(θ) =
, are obtained by Bayesian estimation.
The hierarchical prior p
p
|
Y
ML II )
, which is called an automatic relevance deter-
mination (ARD) prior, has an important role in BPCA. The j th principal axis
w j has a Gaussian prior, and its variance 1
(
W
| α, τ )
/(α j τ)
is controlled by a hyperpara-
meter
α j which is determined by type-II maximum likelihood estimation from
the data. When the Euclidian norm of the principal axis,
w j
, is small rela-
tively to the noise variance 1
α j gets large and the principal
axis w j shrinks nearly to be 0. Thus, redundant principal axes are automatically
suppressed.
, the hyperparameter
 
 
Search WWH ::




Custom Search