Biomedical Engineering Reference
In-Depth Information
SNR was assessed against the distribution of the SNRs of a similarly ranked
gene in 500 class-label permutation experiments (see (49) and supplementary
materials). We will use these 100 genes in combination with genes arising from
alternative gene selection methods.
We analyzed the same lymphoma data set considered in (49) (hereafter
called the WI data), with t -statistics as defined in ยง2.2.1, to obtain the 100 best-
scoring genes according to t -score. The false discovery rate (a measure of sig-
nificance that avoids the flood of false positives arising from multiple compari-
sons that we incur in microarray experiments (50)) corresponding to these 100
genes was estimated to be 5 + 10 -6 . This estimate is based on an assumption that
the t -scores are normally distributed under the null hypothesis that all genes are
similarly distributed in DLBCL and FL. This was checked by random permuta-
tion of the DLBCL and FL labels in the data. Indeed, the pooled probability den-
sity of the t -scores of all the genes after randomization of the labels has an
average of 0.03, a standard deviation of 1.03, and a kurtosis of 3.2, indicating
reasonable resemblance to a Gaussian distribution. The 100 genes found using
this t -statistics have an overlap of 42 genes with the 100 best genes found in (49)
based on the SNR.
Both the SNR ratio and the t -score methods choose genes on the basis of a
univariate criterion. There may be genes whose statistical significance according
to the SNR or t -score method is small but whose significance would be larger if
a multivariate approach were used. To explore this possibility we applied our
multivariate gene expression pattern discovery algorithm, Genes@Work, to
generate groups of markers that express differentially in DLBCL and FL. We
applied Genes@Work to the WI data. 1 The union of genes that participated in at
least one pattern with the parameters described in (46) resulted in 100 genes.
Figure 1 summarizes the information of the genes discovered specifically
by each method. There are a total of 210 genes, of which only 17 were reported
by the three methods. Genes@Work chose 52 genes that neither the SNR ratio
nor the t -score method chose. The SNR method chose 34 genes that neither
Genes@Work nor the t -score methods found. Similarly, the t -score method
found 51 genes that neither of the other methods found. The fact that the 100
most significant genes reported by each of the methods considered found genes
that the other methods did not is the result of the specific questions with which
each method interrogates the data. In Genes@Work, a gene must correlate with
other genes through a pattern to be reported. On the other hand, when selected
by SNR, each gene is considered in isolation and the overlap of its distribution
in DLBCL and FL must be small. Finally, the sample averages in DLBCL and
FL must differ beyond the standard error to qualify as a gene selected by the t-
statistics method.
The question may arise as to whether the combination of methods we advo-
cated above is really necessary. Indeed, if by slightly relaxing the threshold of
significance of any method we could engulf most of the genes discovered by a
Search WWH ::




Custom Search