Microarray Data Analysis Pipelines - User-Level Workflow Design: A Bioinformatics Perspective

Information Technology Reference

In-Depth Information

analyses that are usually carried out next are mainly statistical methods.

This section gives a brief introduction to the concepts of preprocessing and

statistical analysis of microarray data. For more detailed information, the

reader is referred to, e.g., [48, 125].

Due to the measurement variations caused by systematic and stochastic

effects, raw data from microarray experiments needs to be preprocessed before

statistical analyses of the data can be performed. As described in [48, Section

1.4], the intensity Y of a single probe on a microarray is generically modeled

by:

Y = B + αS

where B stands for the background noise (optical effects, non-specific bind-

ing), α is a gain factor, and S the amount of measured specific binding (in-

cluding measurement error and probe effects). According to the additive-

multiplicative error model for microarray data, the measurement error is

usually modeled by

log ( S )= θ + φ +

where θ is the logarithm of the true abundance of the molecule, φ is a probe-

specific effect and the measurement error. These models, or similar ones,

are the basis for various preprocessing methods.

Typically, the preprocessing of microarray data involves three principal

steps (cf. [48, Section 1.2]):

1. Background adjustment aims at estimating the effects of non-specific bind-

ing and noise in the optical detection on the measured probe intensities

in order to adjust the measurements of specific hybridization accordingly.

2. Normalization adjusts the results further to make experiments of different

array hybridizations comparable.

3. Summarization combines the background-adjusted and normalized inten-

sities of multiple probes into one quantity that estimates the amount of

RNA transcript.

There are various methods for all three steps available (cf., e.g., [48,

Chapters 1-6] for an overview), and naturally, there are also preprocessing

strategies that involve additional steps (cf., e.g., [48, Chapter 2]). After pre-

processing, a matrix of expression values is available as basis for subsequent

analyses.

A common first processing step carried out on the expression matrix is

the filtering of the expression values according to some (quality) criterion, in

order to take only samples with specific properties into account. Frequently

applied subsequent statistical analysis steps are, for instance, differential ex-

pression analysis (aiming at the identification of genes for which expression is

significantly different between the samples, cf. [48, Chapter 14]), and Cluster

analysis (aiming at the recognition of patterns in gene expression profiles, cf.

[48, Chapter 13]).

Search WWH ::

Custom Search

Home