Dealing with Missing Values - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

4.4.3.3 EM-Like Repetitive Algorithm

If we know the true parameter

θ true , the posterior of the MVs is given by

(

Y miss ) =

(

Y miss |

Y obs ,θ true ),

which produces equivalent estimation to the PC regression. Here, p

Y obs ,θ true )

is obtained by marginalizing the likelihood ( 4.24 ) with respect to the observed vari-

ables Y obs . If we have the parameter posterior q

(

Y miss |

(θ)

instead of the true parameter, the

posterior of the MVs is given by

(

Y miss ) =

(θ)

(

Y miss |

Y obs ,θ),

which corresponds to the Bayesian PC regression. Since we do not know the true

parameter naturally, we conduct the BPCA. Although the parameter posterior q

(θ)

can be easily obtained by the Bayesian estimation when a complete data set Y is

available, we assume that only a part of Y , Y obs , is observed and the rest Y miss is

missing. In that situation, it is required to obtain q

simultaneously.

We use a variational Bayes (VB) algorithm, in order to execute Bayesian esti-

mation for both model parameter

(θ)

and q

(

Y miss )

and MVs Y miss . Although the VB algorithm

resembles the EM algorithm that obtains maximum likelihood estimators for

and

(θ)

(

Y miss )

Y miss , it obtains the posterior distributions for

and Y miss , q

and q

,bya

repetitive algorithm.

The VB algorithm is implemented as follows: (a) the posterior distribution of

MVs, q

, is initialized by imputing each of the MVs to instance-wise average;

(b) the posterior distribution of the parameter

(

Y miss )

, is estimated using the observed

data Y obs and the current posterior distribution of MVs, q

, q

(θ)

(

Y miss )

; (c) the posterior

distribution of theMVs, q

(

Y miss )

, is estimated using the current q

(θ)

; (d) the hyperpa-

rameter

is updated using both of the current q

(θ)

and the current q

(

Y miss )

; (e) repeat

(b)-(d) until convergence.

The VB algorithm has been proved to converge to a locally optimal solution.

Although the convergence to the global optimum is not guaranteed, the VB algorithm

for BPCA almost always converges to a single solution. This is probably because

the objective function of BPCA has a simple landscape. As a consequence of the VB

algorithm, therefore, q

(θ)

and q

(

Y miss )

are expected to approach the global optimal

posteriors.

Then, theMVs in the expressionmatrix are imputed to the expectationwith respect

to the estimated posterior distribution:

Y miss =

(

Y miss )

dY miss .

y miss q

(4.25)

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home