Information Technology Reference
In-Depth Information
where
ρ
i,j
represents the correlation between
i
th
and
j
th
regressor.
Proof.
The subgradient of the objective function (4) with repect to
β
satisfies
β
=
β
∂J
(
β,λ
2
)
∂β
k
=0
,
β
k
=0
(9)
For
β
i
β
j
=0,itfollowsfrom(9)that
1
λ
2
(
x
(
i
)
−
β
i
−
β
j
=
X β
)
x
(
j
)
)(
y
−
(10)
From (4) and (5), we have
X β
2
J
(
λ
2
, β
)
2
y
−
≤
≤
J
(
λ
2
,β
=0)=
y
(11)
Since
X
are standardized, it can be easily obtained that
x
i
−
x
j
2
=
x
(
i
)
2
+
x
(
j
)
2
2
x
(
i
)
x
(
j
)
=2(1
−
−
ρ
ij
)
(12)
Using (11) and (12), (10) can be re-written as
β
j
≤
2(1
λ
2
y
β
i
−
−
ρ
ij
)
(13)
This completes the proof.
Theorem 1 describes the difference between the coecient paths of the
i
th
and
j
th
regressor. If the
i
th
and
j
th
regressor is highly correlated, the regression
method will assign almost identical coecients (only a change of sign if nega-
tively correlated). Therefore, Theorem 1 provides a quantitative description for
the grouping effect of the Ridge penalty.
However, like most variable selection methods, the regularised two-stage step-
wise selection method [9] is still unable to select more than
N
regressors in the
M
N
scenario. The augmented data technique provides an effective method
by supplementing a fictitious set of data points taken according to an orthogonal
experiment.
3.2 Overcoming
M N
Using Augmented Data Technique
An augmented data set method for regularised regression problems was intro-
duced as follows.
Theorem 2 [10]:
The ridge estimator is equivalent to a least squares estimator
when the actual data are supplemented by a fictitious set of data points taken
according to an orthogonal experiment
H
k
; the response
y
being set to zero for
each of these supplementary data points.
According to Theorem 2, since
H
k
is an orthogonal matrix, obviously the
matrix
H
k
H
k
is diagonal. If the orthogonal columns have also been constructed
Search WWH ::
Custom Search