Information Technology Reference
In-Depth Information
Characterizing Genes by
Marginal Expression Distribution
Edward Wijaya, Hajime Harada, and Paul Horton
Computational Biology Research Center
AIST Waterfront, Bio-IT Research Building
2-42 Aomi, Koto-ku, Tokyo 135-0064
horton-p@aist.go.jp
Abstract. We report the results of fitting mixture models to the dis-
tribution of expression values for individual genes over a broad range
of normal tissues, which we call the marginal distribution of the gene .
The base distributions used were normal, lognormal and gamma. The
expectation-maximization algorithm was used to learn the model pa-
rameters. Experiments with articifial data were performed to ascertain
the robustness of learning. Applying the procedure to data from two
publicly available microarray datasets, we conclude that lognormal per-
formed the best function for modeling the marginal distributions of gene
expression. Our results should provide guidances in the development of
informed priors or gene specific normalization for use with gene network
inference algorithms.
Keywords: microarray, marginal distributions, mixture models.
1
Introduction
Several studies have used finite mixture to model the distributions of gene expres-
sion values. Some notable works include those by Hoyle [1] and Yuan [2]. Hoyle
investigated the entire distributions of expression levels of mRNA extracted from
human tissues. Yuan examined the distribution of gene expression's correlation
coecient on cancer cells. However, less analysis has been done on the marginal
distribution of gene expression levels.
In this paper we present a preliminary analysis of modeling the marginal dis-
tributions using mixture models with normal, lognormal or gamma distributions
as the model components. Compared the previous works this study attempt to
answer the following questions:
1. Is there a generic form of distribution that describe best the marginal ex-
pression of genes?
2. Can we find what is common amongst the genes that have similar mixture
components?
The gamma and lognormal distributions belong to family of skewed distribu-
tions. We expected that these distributions could model the microarray data
 
Search WWH ::




Custom Search