Information Technology Reference
In-Depth Information
where ρ i,j represents the correlation between i th and j th regressor.
Proof. The subgradient of the objective function (4) with repect to β satisfies
β = β
∂J ( β,λ 2 )
∂β k
=0 ,
β k
=0
(9)
For β i β j
=0,itfollowsfrom(9)that
1
λ 2 ( x ( i )
β i
β j =
X β )
x ( j ) )( y
(10)
From (4) and (5), we have
X β
2
J ( λ 2 , β )
2
y
J ( λ 2 =0)=
y
(11)
Since X are standardized, it can be easily obtained that
x i
x j
2 = x ( i )
2 + x ( j )
2
2 x ( i ) x ( j ) =2(1
ρ ij )
(12)
Using (11) and (12), (10) can be re-written as
β j
2(1
λ 2
y
β i
ρ ij )
(13)
This completes the proof.
Theorem 1 describes the difference between the coecient paths of the i th
and j th regressor. If the i th and j th regressor is highly correlated, the regression
method will assign almost identical coecients (only a change of sign if nega-
tively correlated). Therefore, Theorem 1 provides a quantitative description for
the grouping effect of the Ridge penalty.
However, like most variable selection methods, the regularised two-stage step-
wise selection method [9] is still unable to select more than N regressors in the
M
N scenario. The augmented data technique provides an effective method
by supplementing a fictitious set of data points taken according to an orthogonal
experiment.
3.2 Overcoming
M N
Using Augmented Data Technique
An augmented data set method for regularised regression problems was intro-
duced as follows.
Theorem 2 [10]: The ridge estimator is equivalent to a least squares estimator
when the actual data are supplemented by a fictitious set of data points taken
according to an orthogonal experiment H k ; the response y being set to zero for
each of these supplementary data points.
According to Theorem 2, since H k is an orthogonal matrix, obviously the
matrix H k H k is diagonal. If the orthogonal columns have also been constructed
 
Search WWH ::




Custom Search