Biology Reference
In-Depth Information
of motif values. This allows for the identification of (i) overpopulated motifs, and
(ii) genes sharing similar motif values. Hence we have achieved a fine-grained
“clustering” of the data where the number of potential clusters is dependent upon
the definition of the hashing function.
(c) Quantification of transcription state. We define the transcription state of
the system as the CDF of expression values of a select subset of motifs (based on
the corresponding genes) and we will track this quantity as it evolves over time
relative to the control state (distribution at t=0hr). We characterize each motif for
its ability to represent the overall transcription dynamics of the system. In order
to do so we define a new term, transcriptional state that quantifies the deviation of
the aggregate distribution of expression values from a control state. An optimiza-
tion framework is defined which characterizes expression motifs for their strength
in replicating the entire system. Thus, we are able to rank the expression motif for
their contribution to the overall state change of the system. The minimum number
of expression motifs required to accurately represent the dynamic response of the
system defines the set of informative genes, i.e., genes maximally affected by the
specific experimental perturbation. To quantify the hypothesis that informative
subsets of genes should give rise to a distribution of expression values maximally
affected by the experiment, the Kolmogorov-Smirnov (KS) [49] test for evaluating
whether or not two arbitrary distributions are different, is employed. Informative
subsets are the ones with the ability to capture significant deviations from the base
distribution. The KS statistic is defined as: D =max
,
where F ( Y gi (0)) is the cumulative distribution of the expression values at time t=0
This statistic allows a metric that defines the magnitude of the difference between
two distributions to be computed. Since the data is presented as a time series,
at each time point a value for the KS statistic is obtained. Therefore, the overall
metric becomes .With the definition of the transcriptional state and the ability to
quantify the deviations from the control (sham) state we are now in the position
to define a rigorous methodology for selecting maximally informative expression
motifs. The application of the KS test over time allows us to quantify just how
much the CDF of a particular sub-set of genes deviates from the corresponding
CDF at time t=0 (control/sham). We currently implement a greedy algorithm that
adds peaks in the order of their population and select the subset with the greatest
deviation. The greedy heuristic was selected to minimize the combinatorial com-
plexity of the problem, and we feel that is an adequate approximation due to fact
that the greater over-representation of a motif, the more important this motif is.
A detailed discussion of the methodology is presented in [50]. In order to fully
explore the methods we focus on two distinct experimental protocols to assess the
1 ≤i≤n |
F ( Y gi )
F ( Y gi (0))
|
Search WWH ::




Custom Search