Biology Reference
In-Depth Information
or physiological states are the effective composite of multiple linked gene/
protein networks, then an appreciation of the entire dataset, in a global
inclusive manner, is needed. One of the first mechanisms derived to assist
in this global appreciation of signaling function was the process of looking
for biological similarity between components (genes or proteins) in a dataset.
In this context, biological similarity can be assessed by comparing functional
annotations (derived from empirical data) linked to the gene/protein. One
of the earliest developed processes that allowed efficient classification of
gene/protein function was Gene Ontology (GO) ( http://www.
geneontology.org/index.shtml ) analysis. GO analysis attempts to create a
rational and physiologically/pharmacologically relevant appreciation of
large datasets via the identification of gene/protein clusters within the main
data corpus that are related to each other either by function, linkage in a met
abolic process, or by subcellular localization. 135 The number of these associ
ations and the strength of observing multiple factors possessing the same
associations within a large dataset provide the first level of “contextual” rel
evance of the mass dataset. The three main GO classification categories com
monly used to cluster genes/proteins into related and biologically relevant
groups are biological process, molecular function, and cellular component.
Biological process, molecular function, and cellular component are all attri
butes of genes, gene products, or gene product groups. Each of these may be
assigned independently to factors (genes or proteins) in a dataset. The rela
tionships between a given factor and the biological process, molecular func
tion, and cellular component are one-to-many. This reflects the biological
reality that a particular protein, for example, b -arrestin, may exert a function
in multiple processes, contain domains that carry out diverse molecular func
tions, and participate in several alternative interactions with other proteins,
organelles, or locations in the cell. Currently employed gene/protein onto
logical structures themselves reflect the current representation of bio
logical knowledge and therefore should be considered highly plastic.
GO annotation of datasets has been demonstrated to be vital for a variety
of applications, for example, genome sequencing, network modeling, text
data mining, and for applied clinical situations. 136-139 The association of
the appropriate GO terms to a dataset corpus of significant factors is the first
step in the process by which the statistical elucidation of the most likely clus
tering of the factors to a specific group of GO terms can predict biologically
relevant actions. To facilitate this computational process, there are now a
plethora of excellent mathematical applications to achieve this first level
of dataset functional analysis. 20 Clustering of functionally correlated factors
Search WWH ::




Custom Search