Information Technology Reference
In-Depth Information
{+
1
}
if
ʲ j >
0
,
g j
{−
1
}
if
ʲ j <
0
,
[−
1
, +
1
]
if
ʲ j =
0
.
Note that the expression on the right defines a set of subgradients, not a single
subgradient, since when
ʲ j
=
0 we can select any number in
[−
1
, +
1
]
for the j th
element of g . A set of subgradients is referred to as a subdifferential .
For simple notation, we define a vector-valued signum function sgn:
p
R
p
{−
1
,
0
, +
1
}
so that the j th element of sgn
( ʲ )
is
+
1if
ʲ j
>
0,
1if
ʲ j
<
0,
and 0 otherwise. We also define a vector-valued soft threshold function soft
(
v
,
t
)
for
a vector v and a scalar t , so that (soft
.
The next theorem summarizes the optimality conditions for sparse group lasso.
The theorem is from [ 22 ], however, our proof is much simpler and provides clearer
arguments regarding optimality conditions.
(
v
,
t
)) j =
sgn
(
v j )
max
{
0
, |
v j |−
t
}
1
2
( ʲ ) =
Theorem 2
For a sparse group lasso problem ( 14.9 ) defined with f
2
2
in ( 14.5 ) and X G k X G k
y
X
ʲ
=
I | G k |
for k
=
1
,
2
,...,
K , an optimal solu-
tion
ʲ
is given by
1
ʻ
ʲ G k
=
soft
(
s k ,ʻʱ),
k
=
1
,
2
,...,
K
,
soft
(
s k ,ʻʱ) 2
+
X G k y
ʲ G k ,
where s k :=
X
ʲ G k
is a vector with the same elements as
ʲ
except
for having zeros in the components at j
G k , and
( · ) + :=
max
(
0
, · )
.
Proof For
0 , the objective function of ( 14.9 ) is differentiable, and therefore
the first-order optimality condition can be written using X G k X G k
ʲ G k
=
=
I
and s k
=
|
G k |
X G k y
ʲ G k ,
X
ʻ(
1
ʱ)
ʲ G k 2 +
1
s k + ʻʱ
g G k +
ʲ G k
=
0
.
(14.10)
Here g G k
is a subgradient of
ʲ 1 for the components in G k .
0 is optimal. Then the gradients of the sparse group lasso
objective in ( 14.9 ) evaluated at two nonzero points,
Suppose that
ʲ G k
=
ʲ + ʷ
and
ʲ ʷ
, where
ʷ :=
T
(
0
,...,
0
,ʵ,...,ʵ,
0
,...,
0
)
is a p dimensional vector with
ʵ>
0 at components
j
G k ,mustsatisfy
ʻ( 1 ʱ)
ʵ | G k | +
1
s k + ʻʱ
g G k +
ʷ G k
>
0
,
ʻ( 1 ʱ)
ʵ |
1
s k + ʻʱ
g G k
G k | +
ʷ G k
<
0
.
0, we obtain (
g G k ) j ʻ(
ʱ)/ |
ʵ
s k + ʻʱ
G k |
=
,..., |
G k |
Tending
1
for j
1
.
ʲ G k
=
This implies that together with the property of g G k at
0 ,
soft
(
s k ,ʻʱ) 2
s k + ʻʱ
g G k 2 ʻ(
1
ʱ).
(14.11)
 
Search WWH ::




Custom Search