Signature Selection for Grouped Features with a Case Study on Exon Microarrays - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

In other words, smaller groups are more likely to be selected. When this behavior

is not preferable, we can use this scaling to impose equal amounts of regularization

on groups of different sizes.

14.2.2 Overlapping Group Lasso

So far, there has been no consideration of overlaps in groups. Graph lasso discussed

in Sect. 14.2.1 , however, may lead to incorrect feature selection when groups overlap.

Let us take a simple example from [ 12 ], with

T , G 1

ʲ = (ʲ 1 ,ʲ 2 ,ʲ 3 )

}

and G 2 ={

. When we select none or both of G 1 and G 2 , there is no issue. But

selecting only one of them is not permitted by the group lasso regularizer

}

ʨ G . That

is, if we select G 1 (assigning nonzero values to

ʲ 1 and

ʲ 2 ), then G 2 must be selected

as well (since

0). In fact, selecting either G 1 or G 2

is allowed only for a very rare case when the optimal value for the shared variable

ʲ 2 =

0 implies that

ʲ G 2 2 =

ʲ 2

0. Theorem 1 tells that such case could happen when G 1 is selected and

∂

( ʲ ,ʲ

)

0. However, this is not a property of the regularizer

ʨ G : it is not capable

∂ʲ 2

of selecting only one of overlapping groups.

For the case of overlapping groups, we can use an alternative regularizer proposed

by [ 12 ]. The idea is rather simple: we consider a separate coefficient vector

∈ R

for each group k

,...,

K , which can have nonzero values only for components

at j

∈

G k . Using these vectors, the regularizer for overlapping groups is defined by

ʨ O ( ʲ ) := ʻ

inf

k = 1 ʳ

1 ʳ

2 .

(14.8)

= ʲ

Applying this to our simple example above, we have

= (ʳ

1 ,ʳ

2 ,

)

for

G 1

}

and

= (

,ʳ

2 ,ʳ

3 )

for G 2

}

. Now only one of the groups

can be selected since we have separate variables

and

for the second feature,

and therefore choosing

2 does not necessarily imply

2 >

0, for instance

2 .

determined by

ʲ = ʳ

+ ʳ

14.2.2.1 Structure Induced by

ʨ G and

ʨ O

The difference between

ʨ O can be understood clearer by investigating the

structure induced by them on the coefficient vector

ʨ G and

. In particular, we discuss the

set of nonzero components in

{

: ʲ j =

}

, which is equal to the set of features to

be selected.

In case of

ʨ G from group lasso, we have

G k c

{

: ʲ j =

}ↂ

: ʲ G k =

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home