Information Technology Reference
In-Depth Information
Furthermore, (automated) annotation of data and analysis results can be
applied to complement the statistical analyses in order to enrich the experi-
mental data with knowledge from external repositories (cf., e.g., [48, Chapter
7]). For instance, literature databases like PubMed [322] can be searched for
finding articles related to the identified differentially expressed genes, the
corresponding parts of the Gene Ontology [32] can be taken into account, or
pathway databases like KEGG [146] or cMAP [56] can be used for associating
experimental findings with the related pathways.
6.1.3 Bioconductor
As detailed in the previous section, the interpretation of microarray data
is predominantly carried out via statistical analysis methods. Different com-
mercial and academic software frameworks provide support for performing
statistical analyses of microarray data. In academia, the statistics language
GNU R [18] in conjunction with the specialized R packages provided by the
Bioconductor project [105, 1] is particularly popular.
GNU R is a widely used programming language and software environment
for statistical data analysis and visualization. R has a rich collection of stan-
dard functions for commonly required functionality, covering, for instance,
linear and nonlinear modeling, classical statistical tests, classification and
clustering. R's range of functionality can easily be extended by user-created
packages. Packages can be made available to the user community by submit-
ting them to one of the R package repositories, such as the Comprehensive
R Archive Network (CRAN) [16] for all kinds of packages, or Bioconductor
specifically for bioinformatics packages. Bioconductor comprises comprehen-
sive libraries of functions and meta-data predominantly for the analysis of
data from high-throughput genomics and molecular biology experiments, and
additionally provides several example data sets that are useful for testing,
benchmarking and demonstration purposes.
Various (parts of) microarray data analysis procedures have been described
in literature (cf., e.g., [48, 125]) and in the reference manuals and additional
manuscripts provided with the packages at the Bioconductor web site [1]. In
order to illustrate how these analyses can be carried out using R and Bio-
conductor, this section discusses a simple example microarray data analysis
procedure (inspired by [48, Chapter 25]) that uses several common Biocon-
ductor packages. Starting point for the analysis is a set of Affymetrix CEL
files [22] that have been obtained from a microarray scanner (cf. Section 6.1.1).
After preprocessing and filtering of these raw probe-level data, a differential
expression analysis is carried out. Finally, the literature database PubMed
[322] is queried for relevant articles using the names of the top differentially
expressed genes as search keywords.
 
Search WWH ::




Custom Search