Microarray Data Analysis Pipelines - User-Level Workflow Design: A Bioinformatics Perspective

Information Technology Reference

In-Depth Information

Furthermore, (automated) annotation of data and analysis results can be

applied to complement the statistical analyses in order to enrich the experi-

mental data with knowledge from external repositories (cf., e.g., [48, Chapter

7]). For instance, literature databases like PubMed [322] can be searched for

finding articles related to the identified differentially expressed genes, the

corresponding parts of the Gene Ontology [32] can be taken into account, or

pathway databases like KEGG [146] or cMAP [56] can be used for associating

experimental findings with the related pathways.

6.1.3 Bioconductor

As detailed in the previous section, the interpretation of microarray data

is predominantly carried out via statistical analysis methods. Different com-

mercial and academic software frameworks provide support for performing

statistical analyses of microarray data. In academia, the statistics language

GNU R [18] in conjunction with the specialized R packages provided by the

Bioconductor project [105, 1] is particularly popular.

GNU R is a widely used programming language and software environment

for statistical data analysis and visualization. R has a rich collection of stan-

dard functions for commonly required functionality, covering, for instance,

linear and nonlinear modeling, classical statistical tests, classification and

clustering. R's range of functionality can easily be extended by user-created

packages. Packages can be made available to the user community by submit-

ting them to one of the R package repositories, such as the Comprehensive

R Archive Network (CRAN) [16] for all kinds of packages, or Bioconductor

specifically for bioinformatics packages. Bioconductor comprises comprehen-

sive libraries of functions and meta-data predominantly for the analysis of

data from high-throughput genomics and molecular biology experiments, and

additionally provides several example data sets that are useful for testing,

benchmarking and demonstration purposes.

Various (parts of) microarray data analysis procedures have been described

in literature (cf., e.g., [48, 125]) and in the reference manuals and additional

manuscripts provided with the packages at the Bioconductor web site [1]. In

order to illustrate how these analyses can be carried out using R and Bio-

conductor, this section discusses a simple example microarray data analysis

procedure (inspired by [48, Chapter 25]) that uses several common Biocon-

ductor packages. Starting point for the analysis is a set of Affymetrix CEL

files [22] that have been obtained from a microarray scanner (cf. Section 6.1.1).

After preprocessing and filtering of these raw probe-level data, a differential

expression analysis is carried out. Finally, the literature database PubMed

[322] is queried for relevant articles using the names of the top differentially

expressed genes as search keywords.

Search WWH ::

Custom Search

Home