Information Technology Reference

In-Depth Information

levels in gene probes. To our knowledge this is the first study that provides such

a framework for analyzing expression data.

Although theoretically gamma distributions are capable of modeling skewed

distributions, our experiments showed that lognormal appears to be more suit-

able in modeling the marginal distribution of gene expression. We also showed

that amongst the two model selection criteria we used, BIC is more accurate in

selecting the number of components for lognormal and gamma mixtures. AIC

on the other hand tends to over estimate the number of components.

We hypothesize that different functional categories of genes (
e
.g. transcrip-

tion factors, kinases, structural proteins, etc) may show similar marginal distri-

butions. Unfortunately this expectation is not clearly supported by our study.

Only the single, vague gene ontology term
intracellular
was found to be over-

represented in both datasets. We believe follow-up experiments are necessary to

determine if this is a due to the quantity/quality of the expression data used, a

deficiency in our methodology, or whether our hypothesis is simply wrong.

To achieve more definitive results we are now preparing to analyze a much

larger dataset including multiple GEO datasets. This will be essential to sample

the expression probes at the resolution needed to accurately model multimodal

marginal distributions. Our results should provide some guidance in the develop-

ment of informed priors or gene specific normalization for use with gene network

inference.

References

1. Hoyle, D., Rattray, M., Jupp, R., Brass, A.: Making sense of microarray data

distributions. Bioinformatics 18, 576-584 (2002)

2. Ji, Y., Wu, C., Liu, P., Wang, J., Coombes, K.R.: Applications of beta-mixture

models in bioinformatics. Bioinformatics 21(9), 2118-2122 (2005)

3. Kuznetsov, V.: Family of skewed distributions associated with the gene expression

and proteome evolution. Signal Process. 83(4), 889-910 (2003)

4. Mayrose, I., Friedman, N., Pupko, T.: A gamma mixture model better accounts

for among site rate heterogeneity. Bioinformatics 21(2), 151-158 (2005)

5. Dennis, B., Patil, G.P.: The gamma distribution and weighted multimodal gamma

distributions as models of population abundance. Mathematical Biosciences 68,

187-212 (1984)

6. Keles, S.: Mixture modeling for genome-wide localization of transcription factors.

Biometrics 63(1), 2118-2122 (2007)

7. Limpert, E., Stahel, W., Abbt, M.: Log-normal distributions across the sciences:

keys and clues. Bioscience 51(5), 341-352 (2001)

8. Konishi, T.: Parametric treatment of cDNA microarray data. Genome Informat-

ics 7(13), 280-281 (2002)

9. Dempster, N.M., Laird, A.P., Rubin, D.B.: Maximum likelihood from incomplete

data via the EM algorithm. J.R. Stat. Soc. 39(B), 1-38 (1977)

10. Akaike, H.: Information theory and extension of the maximum likelihood principle.

In: Second International Symposium on Information Theory, pp. 267-281 (1973)

11. Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6,

461-464 (1978)

Search WWH ::

Custom Search