Information Technology Reference
In-Depth Information
To facilitate our discussion, we denote a subvector of p dimensional vector
ʲ
that
corresponds to a group G k by
∈ R | G k | .
ʲ G k
:= j ) j G k
Here we consider a fixed permutation of elements in
ʲ G k given by the sorted indices
T and G 1 ={
in G k in an increasing order. For example, if
ʲ = (
a
,
b
,
c
)
1
,
3
}
, then
T , not
T . Also, the cardinality of a finite set G k has been denoted by
ʲ G 1 = (
a
,
c
)
(
c
,
a
)
|
G k |
. The definitions of norms and inner products apply naturally on these subvectors,
j G k j )
2 1 / 2
= j G k ʲ j x j .
T
for example
ʲ G k 2 =
and
ʲ
G k x G k
14.2.1 Group Lasso
Group lasso [ 24 ] is an extension of the lasso penalty discussed in Sect. 14.1.1 for
grouped features. In group lasso we solve the following optimization problem,
K
min
ʲ
f
( ʲ 0 ) + ʨ G ( ʲ ),
ʨ G ( ʲ ) := ʻ
1 ʲ G k 2 .
(14.4)
0
k
=
Here note that the
2 norm in summation is not squared: otherwise
ʨ G becomes
2
equivalent to
ʻ ʲ
2 when groups are not overlapping.
The regularizer
ʨ G that replaces the
1 norm of
ʲ
in lasso is often referred to
as the group
2 norm for
each group. The reason behind the name becomes obvious when we reformulate
1 / 2 norm. It computes an
1 norm over groups, and an
ʨ G
defining a new vector z ,
G ( ʲ ) = ʻ
ʲ G 1 2
.
ʲ G K 2
K
ʲ G k 2 = ʻ
:=
1 .
z
z
k
=
1
That is,
2 norm for each group. As a result,
group lasso selects few groups that are relevant for a given task as in lasso. If a group
of features is chosen, then all of the features inside are typically selected. Otherwise,
all corresponding features will have zero coefficients assigned.
ʨ G is an
1 norm over groups, and an
14.2.1.1 Computation of Solutions
To see how group lasso performs groupwise feature selection, we need to understand
the characterizations of its solutions. In this section we consider group lasso defined
with the loss function of the ordinary least squares discussed in Sect. 14.1.1.1 .
 
Search WWH ::




Custom Search