Biomedical Engineering Reference
In-Depth Information
Figure 4 . Schematic of the way support vector machines operate. ( a ) A set of points belonging
to two classes (cases or controls) are used to create a decision boundary (the optimal separating
hyperplane, diagonal line) that optimally separates between the two classes. ( b ) A validation
set of previously unseen examples is classified on the basis of the decision boundary calculated
using the training set. The points that fall below the optimal hyperplane are deemed to be con-
trols, and those that fall above are deemed to be cases.
have a decision rule, which simply states that new points that fall in the region
where the cases (respectively, controls) fell in the training set will be deemed to
belong to the class of cases (respectively, controls). This is illustrated in Figure
4b, where an independent validation data set is plotted. If we use the optimal
hyperplane as the decision boundary, we can count that two cases fall on the
control side, whereas one control falls on the case side. In the example of Figure
4b, we have a false positive (FP) count of one (one control deemed to be a case)
and a false negative (FN) count of two (two cases deemed to be controls). Simi-
larly, the number of true positives (TP) is 10 (i.e., ten cases deemed to be cases),
and the number of number of true negatives (TN) is 12 (controls deemed to be
controls).
We created a decision rule using the Mayo data (similar to Figure 4a, but in
81 dimensions), and then applied it to the CU data as a validation set (as sche-
matized in Figure 4b). When trained on the Mayo data, the support vector ma-
chine will attempt to find an 80-dimensional hyperplane that divides the 81-
dimensional space into two sides, leaving all the case points on one side of the
plane and all the control points on the other side. The genes selected for the
Mayo data allowed us to find an optimal hyperplane that perfectly separates the
Mayo data, i.e., the Mayo data are linearly separable. When we apply the deci-
sion boundary learned from the Mayo data to the Columbia data, we find that all
the Columbia subjects are perfectly classified , that is, all the CLL patients seg-
regate to the same side of the plane as the CLL patients in the Mayo Clinic. In
like manner, all the control subjects fall on the other side of the plane. (This
classification task was performed within the environment of the Genes@Work
software.) This perfect classification indicates that the group of genes selected
by our gene selection algorithms contains enough information to determine the
status of health of a previously unseen patient.
Search WWH ::




Custom Search