Biomedical Engineering Reference
In-Depth Information
1. Introduction
1.1. Data Quality
Microarray is a powerful technology to exploit DNA sequence
information 1 5 . Because of the dramatic reduction in labor, time, and costs,
this technique has become a popular tool for studying thousands of genes si-
multaneously. Gene expression proling through this technology has great
promise in biomedicine. For example, the microarray technology can be
used in the identication of biomarkers, evaluation of prognoses, classica-
tion of disease status, and prediction of clinical outcomes 1;2;6;7 . While this
technology has merits of genomic research, assessment of data quality poses
a unique challenge because of the enormous volume of data 8;9;10 .
There are at least two types of data quality assessment for Affymetrix
gene chip: internal and external data quality assessment. Examination
of internal data quality focuses on each gene chip, such as inspection of
the presence of artifacts, the use of spike-in genes to evaluate sample hy-
bridization eciency, and the application of actin and GAPDH to detect
degradation of RNA and inecient transcription. Aymetrix software (e.g.,
GCOS or MAS 11;12 ) provides some metrics to evaluate internal data qual-
ity, such as scale-factors, percent-present calls, background, and 3'/ 5' ratios
of housekeeping genes. In addition, there are other packages available in R
to graphically present these metrics for visual examination of data quality,
such as simpleay and ayQCReport 13;14 .
For external data quality, assessment of array comparability is an im-
portant issue because an analysis including incomparable arrays is likely to
generate invalid results. Unfortunately, issues of array comparability have
not been addressed adequately, either in literature or in practice. For ex-
amples, several studies have used the Pearson correlation and/or scatter-
plot to check degrees of consistency among arrays 15;16;17 . The Pearson
correlation is a quantitative measure to describe a linear relationship be-
tween two variables 18 . When the correlation is close to 1 (or1), if one
variable increases, then the other variable tends to increase (or decrease).
Since data in the gene chips often show a nonlinear pattern, the use of
correlation may not be appropriate to examining array comparability. An-
other approach is scatterplot which is a graphical technique to depict the
relationship between two variables with one variable in the x-axis and the
other in the y-axis 19;20 . The tool has enjoyed its successful application to
graphical exploration. However, its capability to handle large datasets has
limitations. A huge dataset could hamper the application of scatterplot by
Search WWH ::




Custom Search