Information Technology Reference
In-Depth Information
LASSO
The regularized regression discussed in Sect. 14.1.1 is used to select individual fea-
tures, with the lasso regularizer. The glmnet R package [ 10 ] was used for our
experiments.
GL
The group lasso algorithm discussed in Sect. 14.2.1 is used to select grouped features.
For solving group lasso problems, the grplasso [ 16 ]orthe SGL [ 22 ] R packages
can be used. The latter is designed for sparse group lasso, but it can solve group
lasso problems by specifying the parameter
ʱ =
0 so that the sparse group lasso
formulation in ( 14.9 ) will be optimized without an
1 term.Weusedthe SGL package
for our experiments.
SGL
The sparse group lasso discussed in Sect. 14.2.3 is used to perform both groupwise
and within-group individual feature selection. For analysis we use the SGL package.
Note that the parameter
ʱ
in ( 14.9 ) can be chosen to solve the lasso problem by
setting
can be
determined for instance by cross validation, searching on a two dimensional grid for
both
ʱ =
1, or the group lasso problem by
ʱ =
0. An optimal value of
ʱ
ʻ
and
ʱ
. For the purpose of demonstration, we used a fixed value
ʱ =
0
.
95.
14.3.3 Comparison of Performance
14.3.3.1 Prediction Performance
Since the entire data set is rather small ( n
92), instead of dividing the set into a train-
ing and a test set once, we performed random subsampling: we repeated the process
of choosing 70% of random patient indices (without replacement) for training and
taking the rest for testing. For each trial, we measured the prediction performance
on a test set of the predictor obtained with a training set.
Figure 14.3 shows the AUC (area under the curve) [ 11 , 20 ] scores from 20 random
subsampling trials. The AUC score (left panel) is improved by performing grouped
selection (GL and SGL), compared to the individual selection (LASSO). However,
grouped selection resulted in choosing larger number of features than LASSO (right
panel). Less number of features were chosen by SGL compared to GL as expected,
but with sacrificing a small portion of prediction performance.
In fact, the number of selected features is closely related to the cost of clinical
tests built upon the chosen features. All numbers were relatively small (
=
100) in
our case, however some would prefer a smaller number of features to reduce cost if
degradation in prediction would not be significant. In this regard, SGL in Fig. 14.3
seemed to provide a good compromise between the number of features and prediction
performance.
<
 
Search WWH ::




Custom Search