Microarray Data Analysis Pipelines - User-Level Workflow Design: A Bioinformatics Perspective

Information Technology Reference

In-Depth Information

Fig. 6.6 Basic microarray data analysis workflow

6.2.3 Variable Microarray Data Analysis Pipeline

Figure 6.7 shows a variable workflow model that gives an impression of how

the SIBs in the library can be combined into different microarray analysis

pipelines. Similar to the variable the multiple sequence alignment workflow

discussed in Section 3.2, the SIBs in the model are pre-configured to be readily

executable, and the currently intended analysis steps can be included simply

by redirecting branches. The boxes in the figure represent principal steps of

microcarray data analysis workflows, and contain different (combinations of)

SIBs that realize corresponding tasks:

1. In the (naturally mandatory) input data loading step, the microarray raw

data and the corresponding meta-data is loaded. It can be selected if one

of the benchmark data sets is used that are readily available on the jETI

server, or if the input data from the local file system is used.

2. Preprocessing is also mandatory. Here, AffyExpressPreprocess can be

used to create an ExpressionSet object from the input data, or one of

RMA , GCRMA , Threestep and Express followed by CreateExpressionSet .

3. It is recommended to apply one or more filtering steps to the expression

values before applying further analyses.

4. Optionally, the expression values can be visualized in a HTML table (cre-

ated by Annaffy aafTableInt andthenstoredtothelocalfilesystem).

5. Statistical analysis, for instance in order to identify the top differentially

expressed genes, is then again considered mandatory. Optionally, a tex-

tual representation of the results can be written and stored to the local

file system.

6. Finally, it is useful to retrieve further information about the top differen-

tially expressed genes, for instance probe annotations (via Annaffy

aafTableAnn )or related PubMed articles (via GetPubMedAbstracts ),and

store the resulting (HTML) files.

The SIBs in the data loading, preprocessing, filtering, statistical analysis

and annotation boxes that are highlighted by the light-gray box in the figure

correspond to the basic analysis workflow described above. The workflow that

is defined by the branches as shown in Figure 6.7 instead reads the input

data from the local file systems, uses Threestep for the prepocessing and the

GenefilterKOverA for expression value filtering. Then it creates and stores

an HTML table from the expression values, before the differential expression

analysis is carried out. Finally, an annotation table is created and stored.

Search WWH ::

Custom Search

Home