Graphics Reference
In-Depth Information
4.4.3.3 EM-Like Repetitive Algorithm
If we know the true parameter
θ true , the posterior of the MVs is given by
q
(
Y miss ) =
p
(
Y miss |
Y obs true ),
which produces equivalent estimation to the PC regression. Here, p
Y obs true )
is obtained by marginalizing the likelihood ( 4.24 ) with respect to the observed vari-
ables Y obs . If we have the parameter posterior q
(
Y miss |
(θ)
instead of the true parameter, the
posterior of the MVs is given by
q
(
Y miss ) =
d
θ
q
(θ)
p
(
Y miss |
Y obs ,θ),
which corresponds to the Bayesian PC regression. Since we do not know the true
parameter naturally, we conduct the BPCA. Although the parameter posterior q
(θ)
can be easily obtained by the Bayesian estimation when a complete data set Y is
available, we assume that only a part of Y , Y obs , is observed and the rest Y miss is
missing. In that situation, it is required to obtain q
simultaneously.
We use a variational Bayes (VB) algorithm, in order to execute Bayesian esti-
mation for both model parameter
(θ)
and q
(
Y miss )
and MVs Y miss . Although the VB algorithm
resembles the EM algorithm that obtains maximum likelihood estimators for
θ
θ
and
θ
(θ)
(
Y miss )
Y miss , it obtains the posterior distributions for
and Y miss , q
and q
,bya
repetitive algorithm.
The VB algorithm is implemented as follows: (a) the posterior distribution of
MVs, q
, is initialized by imputing each of the MVs to instance-wise average;
(b) the posterior distribution of the parameter
(
Y miss )
, is estimated using the observed
data Y obs and the current posterior distribution of MVs, q
θ
, q
(θ)
(
Y miss )
; (c) the posterior
distribution of theMVs, q
(
Y miss )
, is estimated using the current q
(θ)
; (d) the hyperpa-
rameter
α
is updated using both of the current q
(θ)
and the current q
(
Y miss )
; (e) repeat
(b)-(d) until convergence.
The VB algorithm has been proved to converge to a locally optimal solution.
Although the convergence to the global optimum is not guaranteed, the VB algorithm
for BPCA almost always converges to a single solution. This is probably because
the objective function of BPCA has a simple landscape. As a consequence of the VB
algorithm, therefore, q
(θ)
and q
(
Y miss )
are expected to approach the global optimal
posteriors.
Then, theMVs in the expressionmatrix are imputed to the expectationwith respect
to the estimated posterior distribution:
Y miss =
(
Y miss )
dY miss .
y miss q
(4.25)
 
Search WWH ::




Custom Search