Information Technology Reference
In-Depth Information
difference of learning methods focus on the optimisation of the different cost
functions. According to the “sum-of-squared errors + penalty” criterion, these
optimisation problems can be formulated into the following form as
β
=argmi
β
{
2
+
f
(
λ,β
)
y
−
Xβ
}
,
(3)
where
f
(
λ,β
) is usually
L
1
-norm penalty function or
L
2
-norm penalty function
or Elastic Net penalty function (both
L
1
-norm penalty and
L
2
-norm penalty).
The two-stage stepwise selection method [9] only consider the optimisation of
the sum-of-squared errors, which can not select the gene with high correlation.
To extend the gene selection ability of the recently proposed two-stage stepwise
selection, the
L
2
-norm penalty is added into the cost function as follows
2
+
λ
2
2
,
J
(
β,λ
2
)=
y
−
Xβ
β
(4)
j
=1
M
2
=
β
j
.
where
λ
2
is the regularisation parameter, and
β
The estimator
β
is the minimizer of (4)
β
=argmin
{
J
(
β,λ
2
)
}
(5)
3 Two-Stage Gene Selection Method
3.1 The Grouping Effect of the Ridge Penalty
Qualitatively speaking, a regression method exhibits the grouping effect if the
regression coecients of a group of highly correlated variables tend to be equal
(or a change of sign if negatively correlated) [8]. In fact, the (4) is the ridge
optimisation problem. The
L
2
penalty in ridge regularisation can provide the
grouping effect as shown in the following.
After the regression matrix
X
is standardised, then obviously
⎡
⎣
⎤
⎦
1
ρ
12
···
ρ
1
M
∗
1
···
ρ
2
M
X
T
X
=
(6)
.
.
.
.
.
.
∗ ∗ ···
1
where
ρ
i,j
is the sample correlation between
i
th
regressor and
j
th
regressor, '
∗
represents the symmetrical structure. The ridge estimator is expressed by
β
=(
X
T
X
+
λ
2
I
)
−
1
X
T
y
(7)
Theorem 1:
Suppose that the response
y
is centred, the regression matrix
X
are standardised, and
β
is the solution of (4). If
β
i
β
j
=0,then
2(1
β
j
≤
λ
2
y
β
i
−
−
ρ
ij
)
(8)
Search WWH ::
Custom Search