Information Technology Reference
In-Depth Information
⊧
⊨
{+
1
}
if
ʲ
j
>
0
,
g
j
∈
{−
1
}
if
ʲ
j
<
0
,
⊩
[−
1
,
+
1
]
if
ʲ
j
=
0
.
Note that the expression on the right defines a set of subgradients, not a single
subgradient, since when
ʲ
j
=
0 we can select any number in
[−
1
,
+
1
]
for the
j
th
element of
g
. A set of subgradients is referred to as a
subdifferential
.
For simple notation, we define a vector-valued
signum
function sgn:
p
R
ₒ
p
{−
1
,
0
,
+
1
}
so that the
j
th element of sgn
(
ʲ
)
is
+
1if
ʲ
j
>
0,
−
1if
ʲ
j
<
0,
and 0 otherwise. We also define a vector-valued
soft threshold
function soft
(
v
,
t
)
for
a vector
v
and a scalar
t
, so that (soft
.
The next theorem summarizes the optimality conditions for sparse group lasso.
The theorem is from [
22
], however, our proof is much simpler and provides clearer
arguments regarding optimality conditions.
(
v
,
t
))
j
=
sgn
(
v
j
)
max
{
0
,
|
v
j
|−
t
}
1
2
(
ʲ
)
=
Theorem 2
For a sparse group lasso problem
(
14.9
)
defined with f
2
2
in
(
14.5
)
and
X
G
k
X
G
k
y
−
X
ʲ
=
I
|
G
k
|
for k
=
1
,
2
,...,
K , an optimal solu-
tion
ʲ
is given by
1
ʻ
ʲ
G
k
=
−
soft
(
s
k
,ʻʱ),
k
=
1
,
2
,...,
K
,
soft
(
s
k
,ʻʱ)
2
+
X
G
k
y
ʲ
−
G
k
,
where
s
k
:=
−
X
ʲ
−
G
k
is a vector with the same elements as
ʲ
except
for having zeros in the components at j
∈
G
k
, and
(
·
)
+
:=
max
(
0
,
·
)
.
Proof
For
0
, the objective function of (
14.9
) is differentiable, and therefore
the first-order optimality condition can be written using
X
G
k
X
G
k
ʲ
G
k
=
=
I
and
s
k
=
|
G
k
|
X
G
k
y
ʲ
−
G
k
,
−
X
ʻ(
1
−
ʱ)
ʲ
G
k
2
+
1
−
s
k
+
ʻʱ
g
G
k
+
ʲ
G
k
=
0
.
(14.10)
Here
g
G
k
is a subgradient of
ʲ
1
for the components in
G
k
.
0
is optimal. Then the gradients of the sparse group lasso
objective in (
14.9
) evaluated at two nonzero points,
Suppose that
ʲ
G
k
=
ʲ
+
ʷ
and
ʲ
−
ʷ
, where
ʷ
:=
T
(
0
,...,
0
,ʵ,...,ʵ,
0
,...,
0
)
is a
p
dimensional vector with
ʵ>
0 at components
j
∈
G
k
,mustsatisfy
⊧
⊨
ʻ(
1
−
ʱ)
ʵ
√
|
G
k
|
+
1
−
s
k
+
ʻʱ
g
G
k
+
ʷ
G
k
>
0
,
ʻ(
1
−
ʱ)
ʵ
√
|
1
⊩
−
s
k
+
ʻʱ
g
G
k
−
G
k
|
+
ʷ
G
k
<
0
.
0, we obtain
(
g
G
k
)
j
≤
ʻ(
−
ʱ)/
√
|
ʵ
ₒ
s
k
+
ʻʱ
G
k
|
=
,...,
|
G
k
|
Tending
1
for
j
1
.
ʲ
G
k
=
This implies that together with the property of
g
G
k
at
0
,
soft
(
s
k
,ʻʱ)
2
≤
s
k
+
ʻʱ
g
G
k
2
≤
ʻ(
1
−
ʱ).
(14.11)
Search WWH ::
Custom Search