Signature Selection for Grouped Features with a Case Study on Exon Microarrays - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

14.4 Discussion

The use of the lasso regularization (Sect. 14.1.1 ) also appears in many different

fields, for instance in compressed sensing [ 4 , 5 , 8 ] which solves an optimization

problem of the form of ( 14.1 ) with least squares loss function and an

1 regularizer.

In compressed sensing, a signal of length p is recovered from few observations (small

n ), under the assumption that the original signal is sparse. Exact recovery with a high

probability is guaranteed under certain conditions.

For group lasso (Sect. 14.2.1 ) with the ordinary least squares loss function, there

are alternative methods to group lasso, including group LARS (Least Angle Regres-

sion Selection) and group non-negative garrotte [ 24 ]. These methods have slightly

different characteristics on their solution path. Also, group LARS usually scales

much better than group lasso.

In Sect. 14.2.2 , we have introduced a naive approach to reformulate overlapping

group lasso to group lasso, by replicating features that belong to multiple groups.

However, this approach increases the dimension of optimization, and therefore may

not be preferable when p is large. There exist several optimization algorithms that

do not require such replication [ 13 , 25 , 28 ].

When the dimension in data is much larger than the size of a sample ( p

n ),

the solution of the optimization problem in ( 14.1 ) can vary even by small changes

in the sample. Denoting an estimate by ʲ

n

which we obtain by solving ( 14.1 ) with a

sample of size n , and denoting a true unknown parameter by

ʲ ∗ , we can define the

notion of consistency in terms of variable selection,

P

: ʲ

: ʲ j

n

j

{

j

=

0

}={

j

=

0

}

ₒ

1

,

as n

ₒ∞ .

When a method is consistent in terms of variable selection and the convergence above

is fast enough, then small n may not matter much as an estimate ʲ

n

ʲ ∗ .

Lasso produces consistent estimates when some strong conditions hold [ 17 , 27 ].

Unfortunately features from high-throughput genomic profiling are typically highly

correlated, for which these conditions often break. Reference [ 2 ] has shown that

under a fixed p and a specific choices of

will be close to

, the intersection of features selected by

bootstrapped lasso estimates is consistent under less restrictive conditions. Refer-

ence [ 18 ] has proposed the randomized lasso method, which potentially has better

consistency. This issue has been studied in bioinformatics in terms of stable feature

selection [ 1 , 7 ]. With the development of high-throughput profiling technologies, the

dimension in data keeps growing. Therefore consistency remains as a challenging

topic for research.

ʻ

14.5 Conclusion

The rapid growth of dimensionality in modern high-throughput measurement

technology requires us to consider extra information on features, in order to avoid

averse effects of high dimensionality such as overfitting. Information on groupings

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home