Information Technology Reference
In-Depth Information
In other words, smaller groups are more likely to be selected. When this behavior
is not preferable, we can use this scaling to impose equal amounts of regularization
on groups of different sizes.
14.2.2 Overlapping Group Lasso
So far, there has been no consideration of overlaps in groups. Graph lasso discussed
in Sect. 14.2.1 , however, may lead to incorrect feature selection when groups overlap.
Let us take a simple example from [ 12 ], with
T , G 1
ʲ = 1 2 3 )
={
,
}
1
2
,
and G 2 ={
. When we select none or both of G 1 and G 2 , there is no issue. But
selecting only one of them is not permitted by the group lasso regularizer
2
,
3
}
ʨ G . That
is, if we select G 1 (assigning nonzero values to
ʲ 1 and
ʲ 2 ), then G 2 must be selected
as well (since
0). In fact, selecting either G 1 or G 2
is allowed only for a very rare case when the optimal value for the shared variable
is
ʲ 2 =
0 implies that
ʲ G 2 2 =
ʲ 2
=
0. Theorem 1 tells that such case could happen when G 1 is selected and
f
( ʲ
)
0
=
0. However, this is not a property of the regularizer
ʨ G : it is not capable
∂ʲ 2
of selecting only one of overlapping groups.
For the case of overlapping groups, we can use an alternative regularizer proposed
by [ 12 ]. The idea is rather simple: we consider a separate coefficient vector
k
p
ʳ
∈ R
for each group k
=
1
,
2
,...,
K , which can have nonzero values only for components
at j
G k . Using these vectors, the regularizer for overlapping groups is defined by
K
k
ʨ O ( ʲ ) := ʻ
inf
k = 1 ʳ
1 ʳ
2 .
(14.8)
k
= ʲ
k
=
1
1
1
T
Applying this to our simple example above, we have
ʳ
=
1
2 ,
0
)
for
2
2
2
T
G 1
={
1
,
2
}
and
ʳ
= (
0
2
3 )
for G 2
={
2
,
3
}
. Now only one of the groups
1
2
2
2
can be selected since we have separate variables
ʳ
and
ʳ
for the second feature,
1
2
and therefore choosing
ʳ
2 does not necessarily imply
ʳ
2 >
0, for instance
ʲ
is
1
2 .
determined by
ʲ = ʳ
+ ʳ
14.2.2.1 Structure Induced by
ʨ G and
ʨ O
The difference between
ʨ O can be understood clearer by investigating the
structure induced by them on the coefficient vector
ʨ G and
ʲ
. In particular, we discuss the
set of nonzero components in
ʲ
,
{
j
: ʲ j =
0
}
, which is equal to the set of features to
be selected.
In case of
ʨ G from group lasso, we have
G k c
{
j
: ʲ j =
0
}ↂ
.
k
: ʲ G k =
0
 
Search WWH ::




Custom Search