Characterizing Genes by Marginal Expression Distribution - Advances in Computational Science and Engineering

Information Technology Reference

In-Depth Information

parameters correspond to the parameters in each distributions (i.e. shape , scale

in gamma and mean , standard deviation in normal and lognormal distributions).

The detail of the EM algorithm is as follows:

1. Initialize θ A and θ B . It is done by first randomly partitioning the dataset

into K groups and then calculate method of moment estimates for each of

the groups.

2. M-step: Given p ( x|j ) ∗ , maximize loglikelihood (LL) with respect to the pa-

rameters θ A

and θ B . We obtain maximizer for [ θ A 1 , θ B 1 ,...,θ A K , θ B K ]nu-

merically.

3. E-step: Given the parameter estimates from M-step, we compute:

|x, θ A 1 , θ B 1 ,...,θ A K , θ B K ]

p ( x|j ) ∗ = E [ p ( x|j )

(4)

p ( x|j )

j =1 p ( x|j )

=

(5)

4. Repeat M-step and E-step until the change in the value of the loglikelihood

(LL) is negligible.

In order to avoid local maxima, we run the above EM algorithm ten times with

different starting points.

2.2 Model Selection

When fitting mixture models to expression data, it is necessary to desierable to

choose an appropriate number of components, which fits the data well but does

not overfit. For this task we tried two information criteria: AIC (Akaike Informa-

tion Criterion [10]) and BIC (Bayesian Information Criterion [11]). Specifically:

AIC =

−

2 LL +2 c

(6)

and

BIC =

−

2 LL + clog ( N )

(7)

To choose models, we fit mixture models with the EM algorithm for one to five

components and chose the model with the smallest information criteria value (the

degree of freedom c in the above formulas, is equal to 3 K − 1for K components).

3 Experimental Results

3.1 Estimating Number of Components

We generated simulated datasets from mixture models containing with one, two

and three components. We performed two sets of equivalent experiments, one

using the gamma and one using the lognormal distribution for the mixture model

components. For the component parameters, each distinct combination of the

Advances in Computational Science and Engineering

Search WWH ::

Custom Search

Home