Database Reference
In-Depth Information
use of different platforms, can cause bias because
of distortion among distribution, scale of inten-
sity expression, etc. Such systematic differences
present a substantial obstacle to the analysis of
micro-array data, which may result in inconsistent
and unreliable information. Therefore, one of the
most pressing challenges in meta-analyses is how
to integrate results from different micro-array
experiments or combine data sets prior to the
specific analysis. In general, in order to perform
efficiently the meta-analysis, the procedure of
combining independent datasets falls into three
steps detailed below: identification of common
probesets, normalization and transformation of
distribution.
performed via known existing methods. Standard
procedures as RMA (Robust Multiarray Analy-
sis) or LOWESS (Cleveland and Devlin, 1979)
are frequently used and provide more robust
normalizations (see section “MICRO-ARRAYS
EXPERIMENTS”).
Distribution Transformation
After per chip normalization, scales and distribu-
tions of two data sets may still be different. Data
distributions vary greatly between two raw and
even normalized data sets. Therefore, a distribution
transformation method is needed to ensure data
sets similar distribution before combining them
(Jiang et al., 2004, Kim et al., 2007).
A simple transformation method is to transform
the gene expression ratios of each data set on the
basis of a reference data by the pooled standard
deviation and mean expression values to have
similar expression patterns in corresponding ex-
perimental group (Kim et al., 2007). For instance,
one of data sets is selected to be the reference.
Then each other data is transformed on the basis
of this reference. (Jiang et al., 2004) proposed a
distribution transformation based on a Cumulative
Distribution Function (CDF) that transforms each
data set Y according to the following equation:
Identifying a List of Common Probesets
Since data sets come from different chips or dif-
ferent platforms, it is essential to guarantee that
same probesets (homonyms) in combined data
match identical (or approximately identical) oligo
sequences. To obtain a more stringent subset of
matched probesets/sequences, mapping com-
mon genes on the basis of probeset sequences
comparison is more reliable and conservative.
Only common probesets that share the same
target sequences are selected in datasets (Jiang
et al., 2004).
= -1 (()) ,
zFFx
X
Y
Per Data Normalization
where X is the reference and Fx
X () and Fx
Y ()
Differences in treatment of two samples, espe-
cially in labelling and in hybridization, bias the
relative measures on any two arrays. Individual
normalization of data sets is done to compensate
for systematic technical differences between
data samples within a same data set, to see more
clearly the systematic biological differences
between samples. (Jiang et al., 2004) proposed
the simplest method for per chip normalization
in which the expression of each probe set in each
array was divided by the median of the data. But
normalization of individual data sets is generally
are respective distribution functions.
Consequently Z and X have the same distri-
bution.
Once integration and transformation is pro-
cessed, statistical analyses may be driven on
these new data. The two following subsections
are devoted to this subject.
Search WWH ::




Custom Search