Information Technology Reference
In-Depth Information
that is often skewed [3]. Additionally we use the standard normal distribution
as a control experiments on which we compare how well gamma and lognormal
mixture models perform.
Our choice of using gamma distribution is because of its flexible shape. Fur-
thermore it has been successfully used in many studies of biological systems
[4,5,6]. With regard to the lognormal distribution, there is a strong evidence
that this distribution appears in many biological phenomena [7]. In practice it
is also convenient for analyzing microarray data is because it is easy to perform
calculations and capable of determining the data z -scores, a possible common
unit for data comparison [8]. Below we describes the detail of our methods and
experimental results.
2 Methods
2.1 Statistical Model
Let
{x i },i =1 ,...,N denote the expression value of a gene probe, where N is the
total number of observations (samples). Under a mixture model, the probability
density function for observing finite data points x i is:
p ( x )= K
p ( x|j ) P ( j )
(1)
j =1
The density function for each component is denoted as p ( x|j ). In appendix we
give the formal description of density function from three types of distributions
used in our model. And P ( j ) denotes the prior probability of the data point
having been generated from component j of the mixture. These priors are chosen
to satisfy the constraints j =1 P ( j ) = 1. The log likelihood function of the data
is given by:
N
K
LL =
−logL =
log
p ( x i |j ) P ( j )
(2)
i =1
j =1
We use expectation-maximization (EM) algorithm [9] to learn mixture models
of normal, lognormal and gamma distribution for each probe's expression level.
It is implemented with R programming language. The EM algorithm iteratively
maximizes the loglikelihood and update the conditional probability that x comes
from K -th component. This is defined as
|x, θ A 1 , θ B 1 ,...,θ A K , θ B K ] (3)
The set of parameter [ θ A 1 , θ B 1 ,...,θ A K , θ B K ] is a maximizer of loglikelihood, for
given p ( x|j ). The EM algorithm iterates between an E-step where values p ( x|j )
are computed from the current parameter estimates, and M-step in which the
loglikelihood with each p ( x|j ) replaced by its current conditional expectation
p ( x|j ) is maximized with respect to the parameters θ A
p ( x|j ) = E [ p ( x|j )
and θ B .Thesetwo
 
Search WWH ::




Custom Search