Information Technology Reference
In-Depth Information
analyses that are usually carried out next are mainly statistical methods.
This section gives a brief introduction to the concepts of preprocessing and
statistical analysis of microarray data. For more detailed information, the
reader is referred to, e.g., [48, 125].
Due to the measurement variations caused by systematic and stochastic
effects, raw data from microarray experiments needs to be preprocessed before
statistical analyses of the data can be performed. As described in [48, Section
1.4], the intensity Y of a single probe on a microarray is generically modeled
by:
Y = B + αS
where B stands for the background noise (optical effects, non-specific bind-
ing), α is a gain factor, and S the amount of measured specific binding (in-
cluding measurement error and probe effects). According to the additive-
multiplicative error model for microarray data, the measurement error is
usually modeled by
log ( S )= θ + φ +
where θ is the logarithm of the true abundance of the molecule, φ is a probe-
specific effect and the measurement error. These models, or similar ones,
are the basis for various preprocessing methods.
Typically, the preprocessing of microarray data involves three principal
steps (cf. [48, Section 1.2]):
1. Background adjustment aims at estimating the effects of non-specific bind-
ing and noise in the optical detection on the measured probe intensities
in order to adjust the measurements of specific hybridization accordingly.
2. Normalization adjusts the results further to make experiments of different
array hybridizations comparable.
3. Summarization combines the background-adjusted and normalized inten-
sities of multiple probes into one quantity that estimates the amount of
RNA transcript.
There are various methods for all three steps available (cf., e.g., [48,
Chapters 1-6] for an overview), and naturally, there are also preprocessing
strategies that involve additional steps (cf., e.g., [48, Chapter 2]). After pre-
processing, a matrix of expression values is available as basis for subsequent
analyses.
A common first processing step carried out on the expression matrix is
the filtering of the expression values according to some (quality) criterion, in
order to take only samples with specific properties into account. Frequently
applied subsequent statistical analysis steps are, for instance, differential ex-
pression analysis (aiming at the identification of genes for which expression is
significantly different between the samples, cf. [48, Chapter 14]), and Cluster
analysis (aiming at the recognition of patterns in gene expression profiles, cf.
[48, Chapter 13]).
 
Search WWH ::




Custom Search