Biology Reference
In-Depth Information
inquiry, quantitative analytical solution strategies to assist in the process
are emerging as critically important.
We now discuss the computational challenges of assessing the presence
of circadian rhythms in time series microarray data.
B. Computational Challenges
If we wish to study a circadian phenomenon, we need to collect data as
frequently as possible and for as long as possible. To detect genes
expressed in a circadian manner, a series of mRNA samples must be
collected over time. Challenges present at the experimental level,
however, always limit the number of data points that can be obtained for
the analysis of any time series.
Consider, for example, a study designed to examine circadian gene
expression patterns in mouse liver or pancreas, with experiments
performed in triplicate and with data points being collected every 4
hours for 48 hours. We would need to purchase and entrain 39 mice
(3 mice per 13 time points) under dark/light conditions over an
appropriately long period of time, before they are transferred into
constant dark at the beginning of the experiment. Three mice will be
sacrificed at the beginning of the experiment, and every 4 hours
thereafter, and their organs of interest excised. We would then need to
extract mRNA from each of the organs from the 39 mice, and each
mRNA sample then needs to be converted to labeled cDNA and then be
hybridized to a microarray. Scanning and analyzing each array will then
be needed to obtain gene expression data. Thus, there are two major
factors limiting the number of data points that can be feasibly
obtained in each time series. First, in most cases the labor and expense
involved in collecting each data point would preclude using intervals
shorter than four hours apart. Second, because of the general dampening
of the circadian rhythm under conditions of constant darkness (see
Ceriani et al. [2002]), the time window for the entire experiment will be
likely limited to about 48 hours.
The challenges for circadian analysis of gene chip-derived time series
are thus considerable, as data sets presented for analysis are typically
characterized by (1) extremely sparse determination (often only 13
points at a 4-hour sampling frequency for 48 hours); (2) extremely high
dimensionality (on the order of 10 4 gene IDs per microarray in current
Affymetrix implementations); and (3) low replicate numbers (thus
limiting pointwise reliability, primarily because of the considerable
financial costs of multiple chips per experimental time point). The sparse
number of data points for each gene expression time series renders the
use of many conventional methods for rhythm analysis inappropriate
because such methods typically require much larger samples to generate
statistically significant results. Instead, idiosyncratic algorithms
Search WWH ::




Custom Search