Information Technology Reference
In-Depth Information
that is often skewed [3]. Additionally we use the standard normal distribution
as a control experiments on which we compare how well gamma and lognormal
mixture models perform.
Our choice of using gamma distribution is because of its flexible shape. Fur-
thermore it has been successfully used in many studies of biological systems
[4,5,6]. With regard to the lognormal distribution, there is a strong evidence
that this distribution appears in many biological phenomena [7]. In practice it
is also convenient for analyzing microarray data is because it is easy to perform
calculations and capable of determining the data
z
-scores, a possible common
unit for data comparison [8]. Below we describes the detail of our methods and
experimental results.
2 Methods
2.1 Statistical Model
Let
{x
i
},i
=1
,...,N
denote the expression value of a gene probe, where
N
is the
total number of observations (samples). Under a mixture model, the probability
density function for observing finite data points
x
i
is:
p
(
x
)=
K
p
(
x|j
)
P
(
j
)
(1)
j
=1
The density function for each component is denoted as
p
(
x|j
). In appendix we
give the formal description of density function from three types of distributions
used in our model. And
P
(
j
) denotes the prior probability of the data point
having been generated from component
j
of the mixture. These priors are chosen
to satisfy the constraints
j
=1
P
(
j
) = 1. The log likelihood function of the data
is given by:
N
K
LL
=
−logL
=
−
log
p
(
x
i
|j
)
P
(
j
)
(2)
i
=1
j
=1
We use expectation-maximization (EM) algorithm [9] to learn mixture models
of normal, lognormal and gamma distribution for each probe's expression level.
It is implemented with
R
programming language. The EM algorithm iteratively
maximizes the loglikelihood and update the conditional probability that
x
comes
from
K
-th component. This is defined as
|x, θ
A
1
, θ
B
1
,...,θ
A
K
, θ
B
K
] (3)
The set of parameter [
θ
A
1
, θ
B
1
,...,θ
A
K
, θ
B
K
] is a maximizer of loglikelihood, for
given
p
(
x|j
). The EM algorithm iterates between an E-step where values
p
(
x|j
)
∗
are computed from the current parameter estimates, and M-step in which the
loglikelihood with each
p
(
x|j
) replaced by its current conditional expectation
p
(
x|j
)
∗
is maximized with respect to the parameters
θ
A
p
(
x|j
)
∗
=
E
[
p
(
x|j
)
and
θ
B
.Thesetwo
Search WWH ::
Custom Search