A Data Warehousing Approach for Genomics Data Meta-Analysis - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

Comparative Study

Example1

1. Sort columns of original matrix:

Here, we compare the RMA-based integration

method proposed above with the CDF reported in

(Jiang et al., 2004). A two-sample Kolmogorov-

Smirnov test is used to compare the distribution

(repartition) of the data combined by CDF and

the data combined by RMA-based procedure

described in the previous subsection. The test

provides the maximal distance between two

samples. In this test, we obtained a distance before

and after combining the data that was around D =

0.07 with p-value< 2.2e -16 . This result demonstrates

similar results between CDF and our RMA-based

procedure on these data. We combined data of

the Affymetrix GSE6475 and GSE9120 series

(described previously in this chapter). Figure 2

that plots intensity densities on two different data

samples, shows three curves for each sample:

ï ï ï ï ï ï

4729

5285

1358

8234

1 224

4235

5358

8789

Þ=

x sort

ï ï ï ï ï ï

2. Compute row means:

ï ï ï ï ï ï

1 224

4235

5358

8789

225

525

800

x sort =

ï ï ï ï ï ï

3. Set mean for all columns:

ï ï ï ï ï ï

225

350

525

800

ï ï ï ï ï ï

225

2225225

225

ï ï ï ï ï ï

the “ single data via RMA ” curve represents

GSE6475 data transformed via standard

RMA normalization,

350350

Þ=

x sort

ï ï ï ï ï ï

525525

ï ï ï ï ï ï

the “ meta data via RMA ” curve represents

GSE6475 data combined and transformed in

a meta-analysis way with GSE9120 data via

a RMA-based transformation as described

in the previous subsection,

800800

8008

.00

4. Unsort columns to original order:

ï ï ï ï ï ï

225225

ï ï ï ï ï ï

800225

800

ï ï ï ï ï ï

the “ meta data via CFD ” curve represents

GSE6475 data combined and transformed

in a meta-analysis way with GSE9120 data

via a CFD transformation as described in

the previous subsection on “Procedure of

integration”.

350350

525225

800350

x sort

x normalized

ï ï ï ï ï ï

525525

225525

525525

8000800

800800

800350

350

225

This method of distribution transformation is

robust, simple and easy for application. While

other methods only perform the distribution trans-

formation for two datasets, the advantage of RMA

is that it allows combining multiple individual

datasets and normalizing them globally. The fol-

lowing section presents a comparative study that

gives a proof of its efficiency.

We can observe that RMA and CFD transforma-

tions give results very close one to the other.

In fact, the quantile normalization method

used in RMA is a specific case of the CDF trans-

formation zFFx

= -1 (()) , where we estimate

F Y by empirical distribution of each array and

F X using the empirical distribution of averaged

sample quantiles. However, our procedure does

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home