Biomedical Engineering Reference
In-Depth Information
asymptotically approaches normality. In both cases, the t -test can be used to
test the null hypothesis that the two samples have the same mean, i.e., the gene
is not differentially expressed. In typical gene expression experiments, however,
we should not assume that the data are normally distributed, especially if the
sample sizes M 1 and M 2 are small. In these cases we can still use the t -score, but
we need to resample its distribution (18-20). Analyses based on t -scores can
also be validated by classification. To do this, the gene selection has to be com-
plemented with classification schemes such as k -nearest neighbors, decision
trees, support vector machines, and naive Bayes (21). In these cases, the classifi-
cation methods take as input the genes whose t -scores rank highest, but the in-
formative nature of the genes is assessed according to whether we can classify
unseen samples correctly.
Other t -type statistics have been proposed. One of the most widely used is
the signal-to-noise ratio (SNR) score, used first in an early seminal paper in gene
expression array research (22). Its definition,
NN
TT
i
(1)
i
(2)
SNR
=
,
[2]
i
+
i
(1)
i
(2 )
is appealing because of its simplicity and its intuitive interpretation: it measures
the degree of overlap of the i th gene distribution in class 1 and class 2.
2.2.2. More Methods of Univariate Gene Selection
Aside from t -score-based methods, there have been many other univariate
methods of gene selection reported in the recent literature. In (23), for example,
information-theoretic ideas were used to design a gene selection method in
which a gene is selected if there exists a gene expression value out of the M val-
ues which partitions the patients in such a way that the entropy of the propor-
tions of cases and controls determined at each side of the partition is minimized.
A maximum likelihood ratio approach was taken in (24) to rank genes in the
order of most discriminating to least discriminating between two classes. Many
other methods of gene selection have been proposed. These include the "ideal
discriminator method" (which can be mapped to t -type statistics) (19), the "Wil-
coxon rank sum test" (19), D 2 statistics (23), a correlation-based feature selection
(23), the Bayesian variable selection approach (25), and the Use-Fold approach,
where genes are selected whose fold changes are greater than the corresponding
assay noise (26), among others.
Search WWH ::




Custom Search