Information Technology Reference
In-Depth Information
correctly, this diagonal matrix will be equivalent to a scalar multiplied by the
identity matrix I M . For any value λ 2 , the matrix may always be scaled so that
H k H k = λ 2 I p . Therefore, for the data set ( X,y ) and a scalar λ 2 ,anaugmented
data set ( Φ,Y ) with N + M observations and M predictors can be defined by
Φ ( N + M ) ×M = X
and Y ( N + M ) = y
.
λ 2 I M
(14)
0
where Φ Δ =[ φ 1 ,...,φ M ]. This augmented data set can now be fed into the
original two-stage stepwise selection algorithm to form a two-stage gene selection
procedure. The least square estimate the parameters is given by
θ = arg mi θ {
( Y
Φθ ) T ( Y
Φθ )
}
=(( Φ ) T Φ ) 1 ( Φ ) T Y.
(15)
It is found that such an augmented data technique not only integrates Ridge
regularisation directly into the regression matrix, but also allows the following
two-stage stepwise selection method to overcome the M
N problem. It is
noted that the augmented data technique does not change the correlation be-
tween the i th and j th regressor, i,j =1 ,...,M . The augmented data set can
then be solved by the following two-stage stepwise algorithm [9].
3.3 Two-Stage Stepwise Selection Method
Stepwise selection is the recommended subset selection technique owing to its
superior performance [11]. By contrast, the recently proposed two-stage selection
algorithm [9], which includes a forward selection stage and a second backward
refinement stage, provides an ecient path.
Forward Recursive Selection - First Stage. The forward selection stage
selects the regressors based on their contributions to maximizing the model error
reduction ratio, one regressor at a time. The selection procedure continues until
some termination criterion is met or a desired model size is reached.
Model Refinement - Second Stage. The above forward stage generates a
model, however, forward selection stage is subject to the constraint that all
previously selected regressors remain fixed and cannot be removed from the
model later. To overcome this deficiency, each previously selected term needs to
be checked again and the model is refined.This review is repeated until all the
selected model terms are more significant than those remaining in the candidate
pool. Finally, a satisfactory model is produced.
3.4 Complete Algorithm
The complete algorithm can be summarized as follows.
Step 1 Initialization: Set a small parameter set Λ including positive values
for λ 2 , e.g., Λ =
, and the iteration index I =1.Setthe
forward selection step ( i.e.,S ) to a positive integer.
{
0 . 01 , 0 . 1 , 1 , 10 , 100
}
 
Search WWH ::




Custom Search