An Efficient Two-Stage Gene Selection Method for Microarray Data - Intelligent Computing for Sustainable Energy and Environment

Information Technology Reference

In-Depth Information

correctly, this diagonal matrix will be equivalent to a scalar multiplied by the

identity matrix I M . For any value λ 2 , the matrix may always be scaled so that

H k H k = λ 2 I p . Therefore, for the data set ( X,y ) and a scalar λ 2 ,anaugmented

data set ( Φ,Y ) with N + M observations and M predictors can be defined by

Φ ( N + M ) ×M = X

and Y ( N + M ) = y

√ λ 2 I M

(14)

where Φ Δ =[ φ 1 ,...,φ M ]. This augmented data set can now be fed into the

original two-stage stepwise selection algorithm to form a two-stage gene selection

procedure. The least square estimate the parameters is given by

θ = arg mi θ {

( Y

−

Φθ ) T ( Y

−

Φθ )

}

=(( Φ ) T Φ ) − 1 ( Φ ) T Y.

(15)

It is found that such an augmented data technique not only integrates Ridge

regularisation directly into the regression matrix, but also allows the following

two-stage stepwise selection method to overcome the M

N problem. It is

noted that the augmented data technique does not change the correlation be-

tween the i th and j th regressor, i,j =1 ,...,M . The augmented data set can

then be solved by the following two-stage stepwise algorithm [9].

3.3 Two-Stage Stepwise Selection Method

Stepwise selection is the recommended subset selection technique owing to its

superior performance [11]. By contrast, the recently proposed two-stage selection

algorithm [9], which includes a forward selection stage and a second backward

refinement stage, provides an ecient path.

Forward Recursive Selection - First Stage. The forward selection stage

selects the regressors based on their contributions to maximizing the model error

reduction ratio, one regressor at a time. The selection procedure continues until

some termination criterion is met or a desired model size is reached.

Model Refinement - Second Stage. The above forward stage generates a

model, however, forward selection stage is subject to the constraint that all

previously selected regressors remain fixed and cannot be removed from the

model later. To overcome this deficiency, each previously selected term needs to

be checked again and the model is refined.This review is repeated until all the

selected model terms are more significant than those remaining in the candidate

pool. Finally, a satisfactory model is produced.

3.4 Complete Algorithm

The complete algorithm can be summarized as follows.

Step 1 Initialization: Set a small parameter set Λ including positive values

for λ 2 , e.g., Λ =

, and the iteration index I =1.Setthe

forward selection step ( i.e.,S ) to a positive integer.

{

0 . 01 , 0 . 1 , 1 , 10 , 100

}

Intelligent Computing for Sustainable Energy and Environment

Search WWH ::

Custom Search

Home