Database Reference
In-Depth Information
Comparative Study
Example1
1. Sort columns of original matrix:
Here, we compare the RMA-based integration
method proposed above with the CDF reported in
(Jiang et al., 2004). A two-sample Kolmogorov-
Smirnov test is used to compare the distribution
(repartition) of the data combined by CDF and
the data combined by RMA-based procedure
described in the previous subsection. The test
provides the maximal distance between two
samples. In this test, we obtained a distance before
and after combining the data that was around
D
=
0.07 with p-value< 2.2e
-16
. This result demonstrates
similar results between CDF and our RMA-based
procedure on these data. We combined data of
the Affymetrix GSE6475 and GSE9120 series
(described previously in this chapter). Figure 2
that plots intensity densities on two different data
samples, shows three curves for each sample:
ì
ü
ì
ü
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
4729
5285
1358
8234
1 224
4235
5358
8789
x
=
Þ=
x
sort
í
ý
í
ý
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
î
þ
î
þ
2. Compute row means:
ì
ü
ì
ü
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
1 224
4235
5358
8789
225
3
.
.
.
.
50
525
800
x
sort
=
Þ
í
ý
í
ý
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
î
þ
î
þ
3. Set mean for all columns:
ì
ü
ì
ü
ï
ï
ï
ï
ï
ï
225
350
525
800
.
.
.
.
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
225
.
2225225
.
.
225
.
ï
ï
ï
ï
ï
ï
1.
the “
single data via RMA
” curve represents
GSE6475 data transformed via standard
RMA normalization,
350350
.
.
350350
.
.
'
Þ=
x
sort
í
ý
í
ý
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
525525
.
.
525525
.
.
ï
ï
ï
ï
ï
ï
2.
the “
meta data via RMA
” curve represents
GSE6475 data combined and transformed in
a meta-analysis way with GSE9120 data via
a RMA-based transformation as described
in the previous subsection,
800800
.
.
8008
.
.00
î
þ
î
þ
4. Unsort columns to original order:
ì
ü
ì
ü
ï
ï
ï
ï
ï
ï
225225
.
.
225225
.
.
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
3
.
50
800225
.
.
800
.
ï
ï
ï
ï
ï
ï
3.
the “
meta data via CFD
” curve represents
GSE6475 data combined and transformed
in a meta-analysis way with GSE9120 data
via a CFD transformation as described in
the previous subsection on “Procedure of
integration”.
350350
.
.
350350
.
.
525225
.
.
800350
.
.
'
x
sort
=
Þ
x
normalized
=
í
ý
í
ý
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
525525
.
.
525525
.
.
225525
.
.
525525
.
.
8000800
.
.
800800
.
.
800350
.
.
350
.
225
.
î
þ
î
þ
This method of distribution transformation is
robust, simple and easy for application. While
other methods only perform the distribution trans-
formation for two datasets, the advantage of RMA
is that it allows combining multiple individual
datasets and normalizing them globally. The fol-
lowing section presents a comparative study that
gives a proof of its efficiency.
We can observe that RMA and CFD transforma-
tions give results very close one to the other.
In fact, the quantile normalization method
used in RMA is a specific case of the CDF trans-
formation
zFFx
X
=
-1
(())
, where we estimate
F
Y
by empirical distribution of each array and
F
X
using the empirical distribution of averaged
sample quantiles. However, our procedure does
Y
Search WWH ::
Custom Search