Graphics Reference
In-Depth Information
4.4.3.3 EM-Like Repetitive Algorithm
If we know the true parameter
θ
true
, the posterior of the MVs is given by
q
(
Y
miss
)
=
p
(
Y
miss
|
Y
obs
,θ
true
),
which produces equivalent estimation to the PC regression. Here,
p
Y
obs
,θ
true
)
is obtained by marginalizing the likelihood (
4.24
) with respect to the observed vari-
ables
Y
obs
. If we have the parameter posterior
q
(
Y
miss
|
(θ)
instead of the true parameter, the
posterior of the MVs is given by
q
(
Y
miss
)
=
d
θ
q
(θ)
p
(
Y
miss
|
Y
obs
,θ),
which corresponds to the Bayesian PC regression. Since we do not know the true
parameter naturally, we conduct the BPCA. Although the parameter posterior
q
(θ)
can be easily obtained by the Bayesian estimation when a complete data set
Y
is
available, we assume that only a part of
Y
,
Y
obs
, is observed and the rest
Y
miss
is
missing. In that situation, it is required to obtain
q
simultaneously.
We use a variational Bayes (VB) algorithm, in order to execute Bayesian esti-
mation for both model parameter
(θ)
and
q
(
Y
miss
)
and MVs
Y
miss
. Although the VB algorithm
resembles the EM algorithm that obtains maximum likelihood estimators for
θ
θ
and
θ
(θ)
(
Y
miss
)
Y
miss
, it obtains the posterior distributions for
and
Y
miss
,
q
and
q
,bya
repetitive algorithm.
The VB algorithm is implemented as follows: (a) the posterior distribution of
MVs,
q
, is initialized by imputing each of the MVs to instance-wise average;
(b) the posterior distribution of the parameter
(
Y
miss
)
, is estimated using the observed
data
Y
obs
and the current posterior distribution of MVs,
q
θ
,
q
(θ)
(
Y
miss
)
; (c) the posterior
distribution of theMVs,
q
(
Y
miss
)
, is estimated using the current
q
(θ)
; (d) the hyperpa-
rameter
α
is updated using both of the current
q
(θ)
and the current
q
(
Y
miss
)
; (e) repeat
(b)-(d) until convergence.
The VB algorithm has been proved to converge to a locally optimal solution.
Although the convergence to the global optimum is not guaranteed, the VB algorithm
for BPCA almost always converges to a single solution. This is probably because
the objective function of BPCA has a simple landscape. As a consequence of the VB
algorithm, therefore,
q
(θ)
and
q
(
Y
miss
)
are expected to approach the global optimal
posteriors.
Then, theMVs in the expressionmatrix are imputed to the expectationwith respect
to the estimated posterior distribution:
Y
miss
=
(
Y
miss
)
dY
miss
.
y
miss
q
(4.25)