Information Technology Reference
In-Depth Information
An Ecient Two-Stage Gene Selection Method
for Microarray Data
Dajun Du 1 , 2 , Kang Li 2 , and Jing Deng 2
1 Key Laboratory of Power Station Automation Technology
Department of Automation, Shanghai University, 200072 Shanghai, China
ddj@shu.edu.cn
2 School of Electronics, Electrical Engineering and Computer Science,
Queen's University Belfast, United Kingdom
k.li@ee.qub.ac.uk, jdeng01@qub.ac.uk
Abstract. Gene selection is a key issue in the analysis of microarray
data with small samples and variant correlation. The main objective of
this paper is to select the most informative genes from thousands of
genes with strong correlation. This is achieved by proposing an ecient
two-stage gene selection (TSGS) algorithm. In this algorithm, the L 2 -
norm penalty are firstly introduced to achieve the grouping effect for
the highly correlated genes. To overcome the small samples problem,
the augmented data technique is then used to produce an augmented
data set. Finally, by using the recently proposed two-stage algorithm,
the most informative genes can be selected effectively. Simulation results
confirm its effectiveness of the proposed approach in comparison with
the popular Elastic Net method.
Keywords: Geneselection,microarray data,smallsamples,two-stepwise
selection method, variant correlation
1 Introduction
DNA microarray technology has been widely employed to obtain microarray data
which provides useful information for extracting disease-relevant genes, diagno-
sis, and classification of disease, etc. [1]. This measurement inevitably involves
destroying the actual system (or cells), which means that sample sizes are small
[2] and many important genes can be highly correlated. The number of genes
is significantly greater than the number of samples, while only a small number
of the thousands of genes show strong correlation with a certain phenotype [3].
Therefore, the gene selection is very crucial for disease diagnosis and treatment,
biological experiment and decision, etc.
Discriminant analysis of microarray data can be referred to as feature selection
in machine learning [4]. According to the way of calculating the feature evalua-
tion index, the existing feature classification methods can be classified into three
categories: filters, wrappers, and embedded methods. The filter approach [5] is
widely used based on gene ranking. However, the drawback is that such a selec-
tion procedure is independent of the specific required prediction/classification
 
Search WWH ::




Custom Search