Information Technology Reference
In-Depth Information
background value of 7.0 or higher. Then (lines 23-25), a simple differential
expression analysis is carried out using the limma package. More precisely, it
calls the lmFit and eBayes (fitting a linear model to the expression data and
ranking the genes in order of evidence for differential expression, respectively)
and creates a TopTable object containing the top-ranked genes. Finally (line
28 ff.), the names of the genes in the TopTable are extracted and used for a
PubMed query. The results are written into an HTML file in table format.
The script can directly be executed, for instance via the R console as shown
in Figure 6.4 (left). After successful execution, the HTML table containing
the links to the related PubMed articles is available at the local file system
and can be opened in any standard browser, as shown at the right side of the
figure. Note that as R is an interpreted language, the commands contained in
the script can also be directly entered into the R console one after another.
Interestingly, working with Bioconductor essentially consists of accessing
predefined functions from the plethora of available packages. Thus, developing
R scripts based on Bioconductor packages is indeed rather service-level orches-
tration of existing software building blocks than actual programming of (new)
functionality. Also higher-level objects like the AffyBatch or ExpressionSet
classes provide convenient, user-level data objects by abstracting from the
internal details of data handling. Being conceived as programming libraries
in the first place, however, most Bioconductor packages provide relatively
fine-grained functionality, which keeps the development of Bioconductor pro-
grams at a lower, programming-language level and demands a good deal of
R knowledge from the user.
6.2 Microarray Data Analysis Workflows
Affymetrix
CEL data
input
statistical
analysis
result
output
preprocessing
[filtering]
[annotation]
Fig. 6.5 Abstract microarray data analysis pipeline
This last scenario demonstrates how biostatistics workflows for the analysis
of microarray data can be designed with Bio-jETI, making use of different Bio-
conductor libraries (cf. Section 6.1.3) in the underlying services. As sketched
in Section 6.1.2 and shown in Figure 6.5, microarray data analysis basically
consists of a sequence of statistical analysis steps, possibly supplemented by
data annotation based on external knowledge. However, the actually applied
analysis and annotation steps vary considerably, as they depend on both the
nature of the input data as on the analysis objectives. Accordingly, the ab-
stract microarray analysis pipeline depicted in Figure 6.5 gives only a rather
simplified characterization of the workflow variants: While preprocessing and
 
Search WWH ::




Custom Search