Information Technology Reference
In-Depth Information
n
log
1
exp
y
i
+
ʲ
0
T
x
i
(
ʳ
,ʲ
0
)
:=
ʳ
f
+
−
˜
.
i
=
1
|
k
−
1
s
≤
s
=
1
|
Finally, we define new groups
H
k
:= {
j
1
|
G
s
|+
1
≤
j
G
s
|}
for
=
k
K
. Then the optimization problem for overlapping group lasso can be
written in the same form as that of group lasso in (
14.4
),
=
1
,
2
,...,
K
(
ʳ
,ʲ
0
)
+
ʻ
1
ʳ
H
k
2
.
min
ʳ
,ʲ
0
f
k
=
Therefore, the same algorithm for group lasso can be applied for overlapping group
lasso.
14.2.3 Sparse Group Lasso
In group lasso (Sect.
14.2.1
) and overlapping group lasso (Sect.
14.2.2
), the coefficient
subvectors corresponding to selected groups are not necessarily sparse, as briefly dis-
cussed in Sect.
14.2.1.1
. This would be undesirable when we try to identify only a
few important features inside selected groups.
An alternative is the sparse group lasso proposed by [
22
]. It tries to induce spar-
sity in coefficients not only groupwise but also within groups, by using a modified
formulation of group lasso in (
14.4
):
min
ʲ
,ʲ
0
f
(
ʲ
,ʲ
0
)
+
ʻ
{
ʱ
ʲ
1
+
(
1
−
ʱ)ʨ
G
(
ʲ
)
}
.
(14.9)
Here
ʱ
∈[
0
,
1
]
is a parameter that determines the mixing of the
1
and the group
(
1
/
2
) norms, similarly to a constant in elastic net (
14.3
). Overlapping group
lasso discussed in Sect.
14.2.2
can be extended in a similar fashion, by augmenting
its objective function with an
ʨ
G
,
1
term.
14.2.3.1 Computation of Solutions
Similarly to group lasso, sparse group lasso in (
14.9
) defines a convex minimiza-
tion problem for which sufficient and necessary conditions can be characterized by
optimality conditions.
To understand the optimality conditions, we need the notion of
subgradient
.For
a function
h
p
R
p
(
ʲ
)
,
g
∈ R
is a subgradient for
h
at
ʲ
∈
if
(
ʲ
)
≥
g
T
(
ʲ
−
ʲ
),
∀
ʲ
∈ R
p
h
h
(
ʲ
)
+
.
p
, it has a subgradient at every point in
p
. For example,
If
h
is a convex function on
R
R
h
(
ʲ
)
=
ʲ
1
is a convex function and its subgradient
g
at
ʲ
is defined by
Search WWH ::
Custom Search