Biomedical Engineering Reference
In-Depth Information
rendering the data both misleading and inecient. Specically, its weakness
to distinguish the high and low densities for a large dataset could mislead
the human eye by overemphasizing the area of a few data points and down-
playing the area of high density. Scatterplot also becomes inecient as data
points increase, because the tool often requires a longer time to complete
the display especially for data more than million points. The slow display
may cause the graphical window to freeze or halt the system when switch-
ing from one program window to another window on a PC. Moreover, the
resulting huge le (more than 10 MB unit for million points) can cause
inconvenience in delivering the information to clients either by email, by a
oppy diskette, or by printing.
1.2. Gene-Level Data Analysis
Aymetrix oligonucleotide gene chip has been widely used to study gene
expression proling in the genomic community 21 28 . The Aymetrix array
uses a set of probes to interrogate a gene expression, where each array con-
sists of thousands of genes. An experiment routinely collects a huge volume
of information; the data structure can be quite complicated. Analyzing such
complex data poses a challenge to biostatisticians to develop an approach
to summarizing probe-level information that can truly reect the level of
a gene expression adequately, while accounting for probe variation, chip
variation, and interaction eects. In addition, due to resource limitations
and/or sample availability, many microarray experiments, such as in vitro
studies, have only a small number of replicates, statistical inferences such as
the p-value signicance testing or condence interval analysis, which work
well with a large sample size, often break down and become impractical.
For example, MAS 5 employs the Tukey's Biweight approach to sum-
marizing gene expression intensity from the modied perfect-match (PM)
and mismatch (MM) signals 12 . Dchip analyzes probe level intensities using
a multiplicative model to decompose each probe signal into a product of
gene expression index and probe-sensitivity index 29 . Robust multi-array
analysis (RMA) uses a stochastic-model-based approach to improve the
preprocessing of array data by taking into account the presence of optical
noise, nonspecic binding and probe-specic eects 30 . Specically, RMA
employs a log scale linear additive model to analyze gene expression based
on PM intensities which have been background corrected and normalized.
The advantage for the use of RMA is the improvement of precision (com-
pared to large variation in MAS 5.0). However, this approach may cause
Search WWH ::




Custom Search