Biology Reference
In-Depth Information
into common GO term groups can be used to infer which specific signaling
functions the genes/proteins may be creating. The coexpression of these fac
tors and the most common similarities in their functional GO term annota
tion can demonstrate a potential predictive output of the dataset. Therefore,
the main goal of mass analytical experimentation is the generation of differ
ential datasets that, with experimental variable isolation, can be linked to a
biochemical function, physiological response, or even an organismal pheno
type. The creation of this functional signaling signature of the dataset allows
the correlation of factor expression to resultant function, with the most pro
foundly enriched factor clusters in the dataset being more reliably linked to
the resultant output. The signature of a dataset is often determined by which
GO terms are represented differently, in a significant fashion more or less
often than expected by chance within the dataset, compared to their expres
sion in a reference whole genome/protein set. 140,141 The most commonly
applied approach for this is the calculation of “enrichment” for each GO
term. Hence dataset enrichment is demonstrated by the presence of signifi
cantly populated GO term clusters, generated by the presence of groups
of related genes/proteins occurring in the experimental dataset at a fre
quency (scaled according to relative dataset sizes) greater than that in the ref
erence dataset. GO analysis can be considered to be a relatively simple two-
way analysis in which a simple gene/protein-to-GO term association is used.
GO term annotation therefore generates a series of simple gene/protein-
associated functional statements. To improve upon this, the generation of
more signaling pathway-focused output pathway enrichment analysis mech
anisms has been developed.
Gene set enrichment analysis (GSEA) is a statistical approach that has
been used to generate a statistically significant representation of a functional
pathway class within a selection of factors from a heterogeneous gene/pro-
tein dataset corpus. There are multiple freely available pathway databases and
facile calculation programs now able to facilitate these computational issues
for molecular biologists. 20 As with most technological application fields,
subsequent iterations and developments can quickly surpass the previous
techniques. In recent years, the use of GSEA has been largely replaced by
a parametric version of this process (PAGE, parametric analysis of gene
set enrichment 142 ). GSEA employs a distribution-free, nonparametric
approach to the analysis of the significance of population (two factors in each
pathway are typically required for effective “population” of that pathway) of
signaling pathways by the input dataset. PAGE and other parametric GSEA
tools use a Central Limit Theorem , which states that “when the sampling size is
Search WWH ::




Custom Search