Database Reference
In-Depth Information
Comparative Study
Example1
1. Sort columns of original matrix:
Here, we compare the RMA-based integration
method proposed above with the CDF reported in
(Jiang et al., 2004). A two-sample Kolmogorov-
Smirnov test is used to compare the distribution
(repartition) of the data combined by CDF and
the data combined by RMA-based procedure
described in the previous subsection. The test
provides the maximal distance between two
samples. In this test, we obtained a distance before
and after combining the data that was around D =
0.07 with p-value< 2.2e -16 . This result demonstrates
similar results between CDF and our RMA-based
procedure on these data. We combined data of
the Affymetrix GSE6475 and GSE9120 series
(described previously in this chapter). Figure 2
that plots intensity densities on two different data
samples, shows three curves for each sample:
ì
ü
ì
ü
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
4729
5285
1358
8234
1 224
4235
5358
8789
x
=
Þ=
x sort
í
ý
í
ý
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
î
þ
î
þ
2. Compute row means:
ì
ü
ì
ü
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
1 224
4235
5358
8789
225
3
.
.
.
.
50
525
800
x sort =
Þ
í
ý
í
ý
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
î
þ
î
þ
3. Set mean for all columns:
ì
ü
ì
ü
ï ï ï ï ï ï
225
350
525
800
.
.
.
.
ï ï ï ï ï ï
ï ï ï ï ï ï
225
.
2225225
.
.
225
.
ï ï ï ï ï ï
1.
the “ single data via RMA ” curve represents
GSE6475 data transformed via standard
RMA normalization,
350350
.
.
350350
.
.
'
Þ=
x sort
í
ý
í
ý
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
525525
.
.
525525
.
.
ï ï ï ï ï ï
2.
the “ meta data via RMA ” curve represents
GSE6475 data combined and transformed in
a meta-analysis way with GSE9120 data via
a RMA-based transformation as described
in the previous subsection,
800800
.
.
8008
.
.00
î
þ
î
þ
4. Unsort columns to original order:
ì
ü
ì
ü
ï ï ï ï ï ï
225225
.
.
225225
.
.
ï ï ï ï ï ï
ï ï ï ï ï ï
3
.
50
800225
.
.
800
.
ï ï ï ï ï ï
3.
the “ meta data via CFD ” curve represents
GSE6475 data combined and transformed
in a meta-analysis way with GSE9120 data
via a CFD transformation as described in
the previous subsection on “Procedure of
integration”.
350350
.
.
350350
.
.
525225
.
.
800350
.
.
'
x sort
=
Þ
x normalized
=
í
ý
í
ý
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
ï ï ï ï ï ï
525525
.
.
525525
.
.
225525
.
.
525525
.
.
8000800
.
.
800800
.
.
800350
.
.
350
.
225
.
î
þ
î
þ
This method of distribution transformation is
robust, simple and easy for application. While
other methods only perform the distribution trans-
formation for two datasets, the advantage of RMA
is that it allows combining multiple individual
datasets and normalizing them globally. The fol-
lowing section presents a comparative study that
gives a proof of its efficiency.
We can observe that RMA and CFD transforma-
tions give results very close one to the other.
In fact, the quantile normalization method
used in RMA is a specific case of the CDF trans-
formation zFFx
X
= -1 (()) , where we estimate
F Y by empirical distribution of each array and
F X using the empirical distribution of averaged
sample quantiles. However, our procedure does
Y
Search WWH ::




Custom Search