Information Technology Reference
In-Depth Information
included in a gene-set (i.e. miss ). Also, N H is the sum of ranking scores of the genes
included in a gene-set, N M is the sum of ranking scores of the genes not included in
a gene-set, and N is the total number of genes in entire gene-list. Finally, the ES for
a gene-set is obtained by taking the maximum absolute difference between P hit and
P miss over all the genes, as in eq. (3). This means that the ES is determined at the
maximum deviation point from zero in Fig. 3.
Fig. 3. The ES is determined at the maximum deviation from zero [1]
If the ES of a gene-set is chosen at the positive region, significant genes for the
gene-set are taken from the left side of the maximum deviation point. That is, in this
case, only the highly up-regulated genes in sample group A, compared with sample
group B, are taken as significant ones. On the other hand, if the ES is chosen at the
negative region, significant genes for the gene-set are taken from the right side of the
maximum deviation point. That is, only the highly down-regulated genes in sample
group A, compared with sample group B, are taken as significant ones. Because of
this reason, both highly up-regulated genes and highly down-regulated genes cannot
be chosen as significant ones at the same time for a specific gene-set, which makes it
hard to reflect some situations incurred in biological pathways. So, we are interested
in investigating a new gene ranking method for the gene-set enrichment analysis.
3.2 Fisher's Criterion Based Gene Ranking
As a ranking statistic for gene ranking, Fisher's criterion (FC) was used. For a gene i ,
the FC is given as below:
(
)
2
()
()
µ
i
-
µ
i
A
B
FC( )
i
=
.
(4)
()
2
()
2
σ
i
+
σ
i
A
B
where µ A , µ B and σ A , σ B are the means and the standard deviations of gene expression
intensities for sample groups A and B, respectively. When FC is used for gene
Search WWH ::




Custom Search