Biology Reference
In-Depth Information
towards early diagnostics, and consequent informed deci-
sions on therapies. This will also find application in the
early detection and correction of imbalances in biotech-
nological processes (see below).
eliminate noise, making it possible to visualize and
interpret trends and patterns in large datasets [53] .
2. Clustering methods [107] are used to identify subsets of
cellular components (e.g., transcripts, proteins, metab-
olites) that exhibit similar trends across experiments.
Also, the experimental conditions can be clustered to
identify the conditions with similar profiles/responses.
Clustering methods are often used for visualization of
multidimensional data sets [53,108] .
3. Analysis of variance (ANOVA) and analysis of covari-
ance (ANCOVA) are often used to compare profiles
from different states [108] , when more than one factor/
variable is included in the experimental design (e.g.,
effects of growth rate and/or limiting nutrients on gene
expression). The analysis is usually done on a feature-
by-feature basis (one gene/protein/metabolite analysed
at a time) to decide whether the individual features have
been affected by each specific change/perturbation
during the experiment [53] .
4. Comprehensive correlations between heterogeneous
'omics' datasets. These can reveal fundamental rela-
tionships between cellular components, for instance the
existence of a limited correlation between proteome and
transcriptome levels in yeast, revealing post-transcrip-
tional regulation as a major phenomenon regulating
protein levels in eukaryotes [53,110,111] .
Comprehensive Data Analysis and Integration
Methods: State-of-the-Art
Computational investigation of biological networks and
their dysregulation involve the utilization and adaptation of
concepts from network theory [96
100] and network
biology ( [101] ; http://www.nrnb.org/ ) . In this section we
review bioinformatics methods and tools used in the
construction and characterization of biological networks,
and their integration with high-throughput 'omics' datasets
in yeast systems biology research.
e
Networks datasets as a main source of primary
(a priori) information and analysis
A considerable amount of information on yeast metabolic,
regulatory and interactions networks is now available (e.g.,
Saccharomyces genome database; www.yeastgenome.org ;
[16,17,102] and references therein; [7] ). Pioneering studies
in yeast systems biology have exploited this information in
new experimental designs, data integration and interpre-
tation as a primary (a priori) source for the characterization
and annotation of new network components. For example,
based on the known components of the galactose utiliza-
tion pathway and its regulation, and careful experimental
design and proper integration of transcriptome and pro-
teome profiles of the deletion mutants of the components
of the pathway, Ideker and coworkers were able to identify
new components and connections
Computational tools (e.g., MATLAB; www.mathworks.co.
uk ; R project; http://www.r-project.org/ ) have many built-in
functions, specialized toolboxes and packages to carry out
the analyses referred to above. Specialized bioinformatics
tools also provide a large range of such methods to both
experts and less experienced users (e.g., GenePattern [112] ;
SIMCA-P
, www.umetrics.com ; GeneSpring, Agilent
Technologies; www.genespring.com/ ; Partek, www.partek.
com/ ).
The functional relevance of genes, transcripts, proteins
and/or metabolites with characteristic profiles/patterns is
interpreted first, with the aid of the primary annotated
networks. Metabolic and regulatory networks provide main
causal relationships between interacting genes, transcripts,
proteins and metabolites (e.g., see Figures 18.1 and 18.2).
Changes in levels of a group of transcripts, proteins or
metabolites as a response to specific perturbations can
reveal the nature of the response by investigating how the
responsive components are connected to each other in the
biological networks. Some relevant tools are also often used
to extract interesting features. Thus, for example,
gene ontology (GO) hierarchical trees ( http://www.
geneontology.org/ ), make it possible to reveal, e.g., rele-
vant groups of genes (with their specific function and/or
participating in specific biological processes, GO categories
or networks) with a significant biological role under the
experimental conditions/perturbations tested. Networks,
รพ
in the regulatory
pathway [103] .
New features from high-throughput data (e.g., tran-
scriptome; proteome) can be extracted using unsupervised
(i.e., no preconditions imposed) or supervised methods,
followed by the use of networks for data interpretation.
After properly selected pre-processing steps, specific for
each technology (e.g., data normalization will be different
if two-dyes or single-dye microarrays are used; [104,105] ),
the following methods are commonly used to summarize
results from large datasets:
1. Principal components analysis (PCA) and partial least
squares (PLS) methods. These are two singular value
decomposition (SVD)-based methods that reduce the
dimensionality of the datasets and relate them to each
other [105] . A number of variations of these methods
have been developed to extract biologically relevant
information from high-dimensional and/or heteroge-
neous data. The datasets are transformed to fewer
dimensions that capture the largest variation and
Search WWH ::




Custom Search