Biomedical Engineering Reference
In-Depth Information
The fact that the disease state from the Columbia subjects could be pre-
dicted without error should not be taken lightly. This is a case in which the train-
ing data was taken in an institution, and the validation data was taken from
another setting. This makes the set of patients have different ethnic and geo-
graphical backgrounds, the gene expression chips correspond to different
batches, the protocols followed for the gene expression assay differ, and the
technicians doing the work have different styles. The fact that we could do per-
fect diagnosis on the Columbia subjects in spite of these differences constitutes
proof of the principle that this set of genes can be used as markers in a gene ex-
pression-based diagnostics procedure.
5.
DISCUSSION AND CONCLUSIONS
Transcription data comprise only the tip of the iceberg of the highly com-
plex cellular systems that translate chemistry into life. Even so, gene expression
data have a very intricate structure that requires sophisticated tools for its study.
Experimental noise in gene expression data makes it necessary to measure a
large number of samples to increase the statistical power of the analyses to
"fish" for the signal in a sea of noise. Furthermore, the signal itself contains
variability inherent to a biological system. Thus, the potential information that
could be extracted from gene expression experiments is hidden in the data in
more than one way. Different algorithms that probe different facets of the data
are needed to discover the hidden patterns and can have their place in the gene
expression analysis tool kit, especially if each of these methods is tailored to
look for a particular statistical order or structure in the data. The complex gene-
gene interactions that give rise to interesting cellular behavior are probably best
captured with multivariate techniques, in which the unit of analysis is groups of
genes rather than genes in isolation.
Validation of the genes selected by our algorithms can be accomplished
using different strategies such as validation by statistical significance and valida-
tion by classification. But other methods exist that have not been sufficiently
explored. In ยง3.2 we have shown that the ability of the genes determined in one
data set to differentiate between class and control in an independent data set
(generated in different laboratories and using different technologies) constitutes
a rather stringent validation.
We advocated a multipronged approach to gene expression data analysis,
consisting of combining different gene selection methods. There is a risk associ-
ated with this choice, as a tradeoff exists between specificity and sensitivity . By
combining different statistical filters, we certainly accomplish higher sensitivity,
yet in doing so we typically sacrifice specificity due to the acceptance of some
outcomes that otherwise would have been excluded. This is a classic dilemma in
statistics that can only be resolved in terms of the application context. For in-
Search WWH ::




Custom Search