Information Technology Reference
In-Depth Information
Step 2 Model constuction:
(A) The parameter λ 2 picks a grid value from Λ , then the augmented data set
( Φ,Y ) is generated according to (14).
(C) Forward selection: At each step, the net contribution for all remaining can-
didate model terms are computed, and find the one that produces the max-
imum contribution. The selection procedure continues until some termina-
tion criterion is met or a desired model size is reached. An initial model
with n regressors is produced.
(D) Backward model refinement: Each previously selected regressor in the ini-
tial model is shifted to the n th position and compared with all remaining
candidate terms. The shifting and comparison procedures are repeated until
no insignificant term remains in the selected model. A satisfactory model is
finally constructed.
(E) If K-fold cross-validation (CV) is used as a termination criteria, then for
each of K experiments, K
1 folds and 1 fold of the augmented data set
are used for training and validation, respectively. After Step 2 (B)-(D) are
operated K times, an n -unit model can be finally determined by K-fold CV.
Step 3 Determining the λ 2 and the corresponding model:
(A) The procedure is monitored and terminated when λ 2 picks the final element
from Λ .Otherwise,set I = I + 1, and go to step 2.
(B) The λ 2 can be chosen by the one giving the smallest CV error, and the
corresponding model is produced.
4 Simulation
Arthritis is a form of joint disorder that involves inflammation of one or more
joints. The Arthritis data set [12] consists of rheumatoid arthritis (RA) and os-
teoarthritis (OA) types. RA is a systemic disease characterized by an aggressive
infiltration of the synovium, which degrades cartilage and bone. However, OA
does not display these histological features, but degradative proteases are nev-
ertheless produced in the synovium. Although OA is the most common type of
arthritis, RA is recognized as the most crippling or disabling type of arthritis.
Therefore, the classification of these two clinically distinct forms of arthritis is
an important issue.
The arthritis data describe the expression of 755 genes in 7 OA and 24 RA
samples. The correlations among all 755 genes were shown as Fig. 1. It is obvi-
ously seen that there exist some genes with high correlation. To verify the ability
of overcoming the small sample of proposed TSGS approach, all 31 data samples
was used as a training set and the initial parameters of S = 100 and λ 2 =0 . 5
were set. Model fitting and tuning parameter selection by 5-fold cross-validation
(CV) were operated on the training data. The solution paths of the parameter
estimates are shown in Fig. 2, where 42 genes ( > 31) were selected by CV. This
means that the proposed TSGS method can solve the small sample problem.
 
Search WWH ::




Custom Search