Signature Selection for Grouped Features with a Case Study on Exon Microarrays - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

14.1.1.2 Regularizers

A popular choice of the regularizer

ʨ( ʲ )

in ( 14.1 ) to perform feature selection

(without considering feature groups) is the

1 norm of

, which is also known as the

lasso penalty [ 23 ],

ʨ( ʲ ) := ʻ ʲ 1 = ʻ

1 | ʲ j | ,

for a given

ʲ 0 is usually not

included in regularization. A property of the lasso penalty is that when a feature is

not important for fitting responses with respect to a given value of

ʻ>

0. This is a convex function in

. The bias term

, the lasso penalty

sets the corresponding coefficient in

to the exactly zero value. So there is no need

for thresholding to filter out irrelevant features after finding solutions of ( 14.1 ). In

fact, the value of

plays a similar role to a threshold value, as we will see later.

When features are correlated, lasso tends to select only few out of the correlated

features (in an unstable way, especially when p

n ). This is not desirable when all

correlated features may matter and therefore have to be selected. A remedy for this

behavior is to use the elastic net regularization [ 26 ], which augments

above as

follows,

ʨ( ʲ ) := ʻ

ʱ ʲ 1 + (

− ʱ) ʲ

(14.3)

Here

ʲ 2 is the

2 norm (the Euclidean norm) of

. The parameter

ʱ ∈[

]

controls the mixing of the

1 and

2 regularizers: the case of

ʱ =

0 is often referred

to as the ridge regression, and for

1 it becomes the lasso penalty. Elastic net

tends to select all correlated features when they are relevant. So correlated groups of

features will be identified, but they may not correspond to known groups of features.

ʱ =

The rest of this chapter is organized as follows. In Sect. 14.2 , the extensions of

lasso, namely the group lasso (Sect. 14.2.1 ), the overlapping group lasso (Sect. 14.2.2 ),

and the sparse group lasso (Sect. 14.2.3 ) algorithms are introduced, discussing their

properties and differences. Acase study on exonmicroarray data follows in Sect. 14.3 ,

demonstrating a possible use of grouped feature selection in bioinformatics. Some

technical issues of the methods are discussed in Sect. 14.4 , followed by conclusions

in Sect. 14.5 .

14.2 Regularized Regression Methods for Grouped Features

When group information on features is available, we can impose it as an extra

constraint for feature selection. Suppose that p features are grouped into K groups,

where we represent each group of G 1 ,

G 2 ,...,

G K as a subset of feature indices, that

is, G k ↂ{

. For simplicity we assume that all features have their groups

assigned, in other words

,...,

}

∪

1 G k ={

,...,

}

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home