Biomedical Engineering Reference
In-Depth Information
some bias because its global background adjustment does not completely
remove nonspecic binding. A modied RMA, GeneChip RMA (GCRMA),
has been introduced to improve the accuracy of RMA without much sacri-
ce in precision 31 . This new approach combines the strengths of stochastic-
model algorithms and physical models.
These approaches can be referred to as the gene-level-based approach
in which the probe level expression data are summarized into gene level
measures, which then are used for a statistical analysis. One advantage of
this strategy is that the data dimensions are reduced to a manageable scale.
Standard statistical methods can be used to select dierentially expressed
genes. For example, Significance Analysis of Microarrays (SAM) has
been widely used as a statistical technique for nding signicant genes in
a set of microarray experiments 32 . This approach calculates a d-score to
each gene which is a ratio of fold-change versus a modied standard devia-
tion (standard deviation plus an exchangeability factor). When genes have
higher scores than an adjustable threshold, permutations of the repeated
measurements are employed to estimate the false discovery rate (FDR), a
measure for multiple comparison.
Since hundreds or thousands of genes are tested simultaneously, simply
using the signicance level for a p-value cuto without adjusting for multiple
tests will increase the chance of false positives. Traditional multiple testing
procedure is to control the family-wise error (FWE) rate 33 . However, the
FWE approach tends to screen out all genes except the ones with extreme
dierential expressions when the number of genes becomes large, as in the
case of microarray experiments. The false discovery rate (FDR) 32;34
oers a less stringent alternative because it uses the expected proportion
of false rejections as an error measure. No matter which criterion is used,
determination of the level of signicance should depend on the objective of
the experiment. For instance, if the objective is to identify a small number
of truly dierentially expressed genes, then a stringent criterion such as
controlling either the family-wise or the false discovery error rate may be
appropriate. On the other hand, for prediction purposes in genomic/genetic
proling studies, the omission of informative genes in the development of
a predictive classier generally has a much more serious consequence on
predictive accuracy than the inclusion of non-informative genes. In such
cases, the stringent control of false-positives may not be essential. In this
chapter, we will not consider multiple comparison adjustment.
There are some limitations in using gene-level data for analysis. For
example, gene expressions obtained from oligonucleotide arrays often show
Search WWH ::




Custom Search