Microarray Data Analysis Pipelines - User-Level Workflow Design: A Bioinformatics Perspective

Information Technology Reference

In-Depth Information

background value of 7.0 or higher. Then (lines 23-25), a simple differential

expression analysis is carried out using the limma package. More precisely, it

calls the lmFit and eBayes (fitting a linear model to the expression data and

ranking the genes in order of evidence for differential expression, respectively)

and creates a TopTable object containing the top-ranked genes. Finally (line

28 ff.), the names of the genes in the TopTable are extracted and used for a

PubMed query. The results are written into an HTML file in table format.

The script can directly be executed, for instance via the R console as shown

in Figure 6.4 (left). After successful execution, the HTML table containing

the links to the related PubMed articles is available at the local file system

and can be opened in any standard browser, as shown at the right side of the

figure. Note that as R is an interpreted language, the commands contained in

the script can also be directly entered into the R console one after another.

Interestingly, working with Bioconductor essentially consists of accessing

predefined functions from the plethora of available packages. Thus, developing

R scripts based on Bioconductor packages is indeed rather service-level orches-

tration of existing software building blocks than actual programming of (new)

functionality. Also higher-level objects like the AffyBatch or ExpressionSet

classes provide convenient, user-level data objects by abstracting from the

internal details of data handling. Being conceived as programming libraries

in the first place, however, most Bioconductor packages provide relatively

fine-grained functionality, which keeps the development of Bioconductor pro-

grams at a lower, programming-language level and demands a good deal of

R knowledge from the user.

6.2 Microarray Data Analysis Workflows

Affymetrix

CEL data

input

statistical

analysis

result

output

preprocessing

[filtering]

[annotation]

Fig. 6.5 Abstract microarray data analysis pipeline

This last scenario demonstrates how biostatistics workflows for the analysis

of microarray data can be designed with Bio-jETI, making use of different Bio-

conductor libraries (cf. Section 6.1.3) in the underlying services. As sketched

in Section 6.1.2 and shown in Figure 6.5, microarray data analysis basically

consists of a sequence of statistical analysis steps, possibly supplemented by

data annotation based on external knowledge. However, the actually applied

analysis and annotation steps vary considerably, as they depend on both the

nature of the input data as on the analysis objectives. Accordingly, the ab-

stract microarray analysis pipeline depicted in Figure 6.5 gives only a rather

simplified characterization of the workflow variants: While preprocessing and

User-Level Workflow Design: A Bioinformatics Perspective

Search WWH ::

Custom Search

Home