Information Technology Reference
In-Depth Information
3.4 Multiple Hypothesis Test in FC-GSEA
Once the ES is obtained for each gene-set, the ES should be normalized to account for
the size of the gene-set, yielding a normalized ES (NES). The eq. (5) is the formula to
calculate NES.
()
()
ES
S
()
NES
S
=
(5)
(
)
Mean ES
S
π
π =Π
1,
,
L
Here ES(S) is the enrichment score for a specific gene-set S and ES π (S) ( π = 1,…, Π) is
the ES for the permutation π of a specific gene-set S . Each permutation π ( π = 1,…, Π)
is obtained by performing the random sampling of a gene-set S according to sample
label and the total Πnumber of the permutated ESs are generated for a specific gene-
set S. The number of permutation grows exponentially with the increasing number of
samples and Π =1000 is generally used [1]. The significant gene-sets are identified by
taking an appropriate number of gene-sets from the list of candidate gene-sets ar-
ranged by the normalized ES in a decreasing order. When FWER (Family-Wise Error
Rate) test or FDR (False Discovery Rate) test is used to identify significant gene-sets,
too small number of gene-sets can be chosen at times. In such cases, significant gene-
sets can be identified with the normalized ES.
4 Experiments and Results
For our experiments, the Leukemia dataset [4] was used which includes 7129 human
gene expression profiles of total 38 leukemia samples belonging to two classes, i.e. 27
acute lymphoblastic leukemia(ALL) samples and 11 acute myeloid leukemia(AML)
samples. To identify significant pathways showing differential expression between
ALL and AML classes, we first generated 167 candidate pathway gene-sets by taking
the pathways each of which include at least five genes from KEGG pathway data-
bases [11, 12]. Out of these pathways, the most significant 40 pathways were then
found by using original GSEA and FC-GSEA in the same way as in [1], respectively.
Also, for biological evaluation of the obtained pathways, a priori known leukemia-
related pathways were manually collected with references to literature [15, 16, 17],
Genetic Association Database, and KEGG pathway database and then used as the
golden standards for biological verification.
4.1 Identification of Significant Genes
To identify significant genes for each of candidate pathways, we applied original-
GSEA and FC-GSEA methods for the experimental dataset. As mentioned earlier,
since the both have different selection strategy and expression difference metrics, the
resulting genes which are differentially regulated ones between ALL and AML
groups showed clear discrepancies by using original-GSEA and FC-GSEA, respec-
tively. For example, significant genes identified from the gene-set of “B cell receptor
signaling pathway” are as follows.
Search WWH ::




Custom Search