Biomedical Engineering Reference
In-Depth Information
from being selected, it is very likely that we will fail in terms of not allowing
relevant genes to join our list of differentially expressing genes. A small p -value
allows us to have a low false positive rate, but will probably make us incur a
high false negative rate. By interrogating the data with several methods that ad-
dress distinct aspects of the data, each method can rescue as true positives the
genes left out by the other methods as false negatives. At the same time, we can
keep a low rate of false positives by using low p -values for each of the partici-
pating methods. The ultimate proof of the soundness of the genes selected is,
in any case, a biological validation of the selected genes. We address that issue
next.
3.2. Validation of Gene Selection by Consistency with Independent Data
It was discussed earlier on that the typical validation strategies used in gene
selection schemes are validation-by-classification and the validation by statisti-
cal significance (ยง2). An interesting alternative is what we have called the vali-
dation-by-consistency method (1), in which the selected genes are validated if
they show consistency in their behavior in a different data set (different labora-
tories and maybe different technology, but the same types of tissues).
In (49), for example, 30 of the 100 markers were verified as informative by
using the gene voting scheme introduced in (22), which is a validation-by-
classification approach. All the 100 markers also passed a test of statistical sig-
nificance, and were thus validated by statistical significance. In our case we
could state that the statistical significance of the patterns found by Genes@Work
was stringent given that the p -value of the least significant pattern was 10 -10 .
However, statistical significance need not mean biological relevance. Further-
more, when we combine several methods to discover differentially expressing
genes, the gene composite resulting from the use of the different methods lacks
an error estimate, even if there was a clear validation approach used in each of
the chosen methods. Thus, it is desirable to have a means by which to validate
the composite, a task for which validation by consistency can be extremely use-
ful and telling.
We shall exemplify the validation-by-consistency approach in the set of
genes found by merging the three methods described in the previous section.
The heat map, or Eisen plot (51), of the 210 genes discovered by applying a
combination of the t -score, the SNR, and the Genes@Work methods on the
DLBCL/FL data generated at the Whitehead Institute can be viewed to the left
of the yellow line in Figure 2. Two groups of genes can be easily visualized: the
ones that overexpress in DLBCL compared to FL (mostly red in DLBCL and
mostly blue in FL), and the ones that underexpress in DLBCL compared with
FL (mostly blue in DLBCL and mostly red in FL). A stringent validation for
these genes would be to check that the same neat separation is achieved in a
Search WWH ::




Custom Search