Database Reference
In-Depth Information
Identifying Differentially
Expressed Genes
very similar studies where differential expression
refer to similar pairs of conditions over studies.
This is an important limitation.
Recently, several propositions (Hong and Bre-
itling, 2008, Hu et al., 2006, Rhodes et al., 2002,
Kim et al., 2007) have shown potential solutions
to overcome the challenges in computational
meta-analysis. Rhodes et al. (Rhodes et al., 2002)
applied a meta-analysis via Fisher's inverse c 2
test by combining p-values results associated with
four datasets on prostate cancer. They tried to
determine genes that are differentially expressed
between clinically localized prostate and benign
tissue. The method validated and confirmed sets
of significantly similar genes across studies. (Choi
and Kim, 2003) used a standard t -test statistic,
defined as effect size to identify differentially
expressed genes, as the summary statistic for
each gene from each individual dataset. They
then proposed a hierarchical modelling approach
to assess both intra- and inter-study variation in
the summary statistic across multiple datasets.
This model-based method estimated an overall
effect size as the measurement of the magnitude
of differential expression for each gene through
parameter estimation and model fitting.
Besides these two approaches on p-values
and effect size, (Hu et al., 2006) compared a
quality-weighted strategy with the traditional
quality-unweighted strategy, and examined how
the quality weights influence two commonly
used meta-analysis methods: combining p-values
and combining effect size estimates. This study
demonstrated that the quality-weighted strategy
can lead to larger statistical power for identifying
differentially expressed genes than the quality-
unweighted strategy and that the combination of
multiple datasets identifies many more differen-
tially expressed genes than individual analysis of
either of the datasets.
In summary, p-values and effect size approach-
es that have been employed in meta-analyses for
differential gene expression analysis are fitted to
Approaches in Classification
Data mining and machine learning methods
including classification methods and clustering
techniques have been proposed to analyse multiple
expression datasets (Jiang et al., 2004, Lee et al.,
2004). (Jiang et al., 2004) used a Random Forest
method and Fisher's Linear Discrimination (FLD)
to select lung adenocarcinoma marker genes from
two different gene expression data sets in order
to predict normal and patient samples. Fisher's
Linear Discrimination is a traditional classifica-
tion method that has computational efficiency,
while Random Forest is based on growing a
set of decision trees on bootstrapped samples.
(Lee et al., 2004) presented a co-expression link
method for studying the functional relevance
and reproducibility of the co-expression patterns
from an analysis of gene co-expression in the
large-scale analysis of mRNA co-expression of 60
large human data sets (3924 micro-arrays) from
the Stanford micro-array Database 16 (SMD) and
the Gene Expression Omnibus. The principle of
co-expression link method is that a gene has a
co-expression link if it has same profile across
at least two data sets. After filtering, each gene
expression profile was compared to all others us-
ing the standard Pearson correlation coefficient.
A co-expression link between two genes was
confirmed if the link was observed in more than
one data set. The authors found that a substantial
number of correlated expression patterns occur
in multiple independent data sets. This confirma-
tion of correlated expression provides a useful
way to improve the confidence in any particular
correlated expression pattern. This study showed
that co-expression patterns that are confirmed are
more likely to be functionally relevant.
Table 3 summaries all these existing methods
for meta-analyses.
Search WWH ::




Custom Search