Biomedical Engineering Reference
In-Depth Information
Figure 1 . Venn diagrams of the set of genes identified in the analysis of diffuse large B-cell
lymphoma and follicular lymphoma by each of three methods: the signal-to-noise ratio (SNR),
Genes@Work (G@W), and the t -score. The numbers indicate the number of genes in each of
the sets.
different method, then the argument that the two methods interrogate the data
differently would be questionable. We mentioned earlier that using the t -test
with a false discovery rate (FDR) of 5 + 10 -6 we discovered 100 genes. Out of
these 100 genes, 24 were also discovered by Genes@Work (Figure 1). How
much should we relax the FDR in the t -score to engulf 90% of the genes discov-
ered by Genes@Work? The answer to this question is very telling. To get 90%
of the genes discovered by Genes@Work using the t -score method, the total
number of genes discovered by the t -statistics should grow to be 1,839, corre-
sponding to an FDR of 0.5. This FDR value is too permissive: at this false dis-
covery rate we expect that half of the 1,839 are false positives! Similar results
can be reached by exploring the other comparisons. We conclude that each of
these methods interrogate the data in its own specific way. Obviously, only
when the methods used are sufficiently different from each other does the com-
bination of algorithms contribute novelty above the application of just one of the
methods.
3.1. The Intersection or the Union?
It may be argued that the "best" genes differentiating between the
two classes under study are the 17 genes within the intersection of all
the methods. Even though this argument makes intuitive sense, it is not neces-
sarily true. Indeed, in each of the methods used above genes were selected under
a relatively strict p -value, which controls the specificity but not the sensitivity of
the method. In other words, when we are very stringent in preventing false genes
Search WWH ::




Custom Search