Dealing with Missing Values - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

θ ≡

,μ,τ

where

is the parameter set. Since the maximum likelihood estimation of

the PPCA is identical to PCA, PPCA is a natural extension of PCA to a probabilistic

model.

We present here a Bayesian estimation method for PPCA from the authors.

Bayesian estimation obtains the posterior distribution of

and X , according to the

Bayes' theorem:

(θ,

) ∝

(

| θ)

(θ).

(4.24)

The prior distribution is a part of the model and must be defined before estimation.

We assume conjugate priors for

(θ)

is called a prior distribution, which denotes a priori preference for parameter

and

, and a hierarchical prior for W , namely, the

K .

prior for W

(

| τ,α)

, is parameterized by a hyperparameter

α ∈ R

(θ | α) ≡

(μ,

,τ | α) =

(μ | τ)

(τ )

(

w j | τ,α j ),

) = N(μ | μ 0 ,(γ μ 0 ) − 1 I m ),

(μ |

tau

,(α j τ) − 1 I m ),

(

w j | τ,α j ) = N(

w j |

(τ ) = G(τ | τ 0 ,γ τ 0 )

G(τ | τ,γ τ )

denotes a Gamma distribution with hyperparameters

and

γ τ

exp

G(τ | τ,γ τ ) ≡ (γ τ τ − 1

) γ τ

Γ(γ τ )

− γ τ τ − 1

τ + (γ τ −

)

where

is a Gamma function.

The variables used in the above priors,

Γ( · )

τ 0 are deterministic

hyperparameters that define the prior. Th ei r actual val u es should be given before the

estimation. We set

γ μ 0 ,

μ 0 ,

γ τ 0 and

10 − 10 ,

γ μ 0 = γ τ 0 =

μ 0 =

0 and

τ 0 =

1, which corresponds to

an almost non-informative prior.

Assuming the priors and given a whole data set Y

y , the type-II maximum

α ML − II and the posterior distribution of the parameter,

likelihood hyperparameter

(θ) =

, are obtained by Bayesian estimation.

The hierarchical prior p

(θ |

,α ML − II )

, which is called an automatic relevance deter-

mination (ARD) prior, has an important role in BPCA. The j th principal axis

w j has a Gaussian prior, and its variance 1

(

| α, τ )

/(α j τ)

is controlled by a hyperpara-

meter

α j which is determined by type-II maximum likelihood estimation from

the data. When the Euclidian norm of the principal axis,

w j

, is small rela-

tively to the noise variance 1

α j gets large and the principal

axis w j shrinks nearly to be 0. Thus, redundant principal axes are automatically

suppressed.

/τ

, the hyperparameter

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home