Characterizing Genes by Marginal Expression Distribution - Advances in Computational Science and Engineering

Information Technology Reference

In-Depth Information

Characterizing Genes by

Marginal Expression Distribution

Edward Wijaya, Hajime Harada, and Paul Horton

Computational Biology Research Center

AIST Waterfront, Bio-IT Research Building

2-42 Aomi, Koto-ku, Tokyo 135-0064

horton-p@aist.go.jp

Abstract. We report the results of fitting mixture models to the dis-

tribution of expression values for individual genes over a broad range

of normal tissues, which we call the marginal distribution of the gene .

The base distributions used were normal, lognormal and gamma. The

expectation-maximization algorithm was used to learn the model pa-

rameters. Experiments with articifial data were performed to ascertain

the robustness of learning. Applying the procedure to data from two

publicly available microarray datasets, we conclude that lognormal per-

formed the best function for modeling the marginal distributions of gene

expression. Our results should provide guidances in the development of

informed priors or gene specific normalization for use with gene network

inference algorithms.

Keywords: microarray, marginal distributions, mixture models.

1

Introduction

Several studies have used finite mixture to model the distributions of gene expres-

sion values. Some notable works include those by Hoyle [1] and Yuan [2]. Hoyle

investigated the entire distributions of expression levels of mRNA extracted from

human tissues. Yuan examined the distribution of gene expression's correlation

coecient on cancer cells. However, less analysis has been done on the marginal

distribution of gene expression levels.

In this paper we present a preliminary analysis of modeling the marginal dis-

tributions using mixture models with normal, lognormal or gamma distributions

as the model components. Compared the previous works this study attempt to

answer the following questions:

1. Is there a generic form of distribution that describe best the marginal ex-

pression of genes?

2. Can we find what is common amongst the genes that have similar mixture

components?

The gamma and lognormal distributions belong to family of skewed distribu-

tions. We expected that these distributions could model the microarray data

Search WWH ::

Custom Search

Home