Biomedical Engineering Reference
In-Depth Information
one method, which can be called validation by classification , the selected genes
are validated if they perform adequately in the prediction of the class of tissue
that a set of unknown samples belongs to. Typically, these unknown samples are
part of the original data set, but are left out in the gene selection phase for later
validation purposes. In a second approach, which could be called validation by
statistical significance , genes are chosen if they behave sufficiently different
from what would be expected if there were no distinction between cases and
controls (the null hypothesis). We proposed an alternative validation method in
(1), a sort of hybrid between the previous two methods, in which the selected
genes are validated if they show consistency in their behavior in a different data
set (different laboratories and may be different technology, but the same types of
tissues).
The typical outcome of a gene selection process is a list of genes that show
differential expression in cases and controls. If the data are analyzed using more
than one method, it is likely that the resulting gene lists will be different, albeit
overlapping. We will discuss how to deal with lists of genes coming from
different algorithms, and the advantages or disadvantages of creating the union
list or the intersection list of these genes. It is clear, however, that the explosion
in the number of methods to analyze expression data should be complemented
with a convergent effort in which different algorithms are used and their results
combined.
The lists of genes generated by the algorithms discussed in this chapter will
have to be organized, possibly with the help of literature search techniques (8) or
by systematically relating the selected genes with existing biological informa-
tion (9,10), in order to bring the results of the microarray technology to mean-
ingful applications. Among these applications we can mention the identification
of potential drug targets (11), the discovery of disease specific genes (12), toxi-
cogenomics (13), disease prognosis (14), and the molecular taxonomy of dis-
eases (15,16). It has been suggested, indeed, that microarrays will be routine
practice in clinical diagnostics within the next decade or so. Making this happen
will surely necessitate a larger number of samples in clinical trials and proof of
the robustness of the technology (16). Towards the end of this chapter we will
present an example in clinical diagnostics which shows that the technology is
indeed reaching a state of maturity, both in terms of the algorithms used for gene
selection and in terms of the DNA array technology.
2.
PREVIOUS WORK : GENE SELECTION METHODS
IN MICROARRAY DATA
To organize the presentation, we will separate the discussion of gene selec-
tion algorithms into univariate and multivariate methods. In either case, the
genes selected as informative need to be validated in one way or another. Two
Search WWH ::




Custom Search