Information Technology Reference
In-Depth Information
n
log 1
exp
y i
+ ʲ 0
T
x i
( ʳ 0 ) :=
ʳ
f
+
˜
.
i
=
1
| k 1
s
s = 1 |
Finally, we define new groups H k
:= {
j
1 |
G s |+
1
j
G s |}
for
=
k
K . Then the optimization problem for overlapping group lasso can be
written in the same form as that of group lasso in ( 14.4 ),
=
1
,
2
,...,
K
( ʳ 0 ) + ʻ
1 ʳ H k 2 .
min
ʳ 0
f
k
=
Therefore, the same algorithm for group lasso can be applied for overlapping group
lasso.
14.2.3 Sparse Group Lasso
In group lasso (Sect. 14.2.1 ) and overlapping group lasso (Sect. 14.2.2 ), the coefficient
subvectors corresponding to selected groups are not necessarily sparse, as briefly dis-
cussed in Sect. 14.2.1.1 . This would be undesirable when we try to identify only a
few important features inside selected groups.
An alternative is the sparse group lasso proposed by [ 22 ]. It tries to induce spar-
sity in coefficients not only groupwise but also within groups, by using a modified
formulation of group lasso in ( 14.4 ):
min
ʲ 0
f
( ʲ 0 ) + ʻ { ʱ ʲ 1 + (
1
ʱ)ʨ G ( ʲ ) } .
(14.9)
Here
ʱ ∈[
0
,
1
]
is a parameter that determines the mixing of the
1 and the group
(
1 / 2 ) norms, similarly to a constant in elastic net ( 14.3 ). Overlapping group
lasso discussed in Sect. 14.2.2 can be extended in a similar fashion, by augmenting
its objective function with an
ʨ G ,
1 term.
14.2.3.1 Computation of Solutions
Similarly to group lasso, sparse group lasso in ( 14.9 ) defines a convex minimiza-
tion problem for which sufficient and necessary conditions can be characterized by
optimality conditions.
To understand the optimality conditions, we need the notion of subgradient .For
a function h
p
R p
( ʲ )
, g
∈ R
is a subgradient for h at
ʲ
if
( ʲ )
g T
( ʲ ʲ ), ʲ ∈ R
p
h
h
( ʲ ) +
.
p , it has a subgradient at every point in
p . For example,
If h is a convex function on
R
R
h
( ʲ ) = ʲ 1 is a convex function and its subgradient g at
ʲ
is defined by
 
Search WWH ::




Custom Search