Signature Selection for Grouped Features with a Case Study on Exon Microarrays - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

Measurements from sensors deployed in different parts of a machine. A task will

be finding parts that affect sensitivity in operation, as well as individual sensors

that matter inside,

•

Gene expression measured in exon level. Exons correspond to coding regions of

genes, and they are translated to proteins that are functional in cells. Identifying

clinically important genes, as well as detecting their different usage of exons, is

an important task in biomedical studies,

•

Genes that belong to different cellular components, different biological functions,

or different molecular functions according to the Gene Ontology. 1

In the examples above, groups of features are used to represent associations of

features that come from our prior knowledge.

Another type of groups comes from our design on features, for instance, when

we perform feature selection on multinomial covariates. Suppose that a feature z

∈

{

A

,

B

,

C

}

is represented with dummy variables x 1 and x 2 , so that

(

x 1 ,

x 2 ) = (

0

,

1

)

,

(

represent A , B , and C , respectively. When z is relevant, then it would

make sense to select both x 1 and x 2 ; otherwise, both variables should not be selected.

Therefore dummy variables that correspond to the same multinomial variable have

to be considered as a group.

For both scenarios, the same methods can be applied for grouped feature selection.

We will focus on the first type where groups represent our knowledge on features.

In the chapter we discuss feature selectionmethods that can extract features in both

individual and group levels. We focus on a popular shrinkage method called lasso ,

and its extensions to handle grouped features. These methods are often referred to as

embedded feature selection methods in machine learning, or penalized (regularized)

regression methods in statistics. A characteristic of them is that feature selection is

integrated with learning predictors, so there is no need to perform each separately.

1

,

0

)

, and

(

1

,

1

)

14.1.1 Regularized Regression

The methods we will discuss in this chapter can be described as optimization

problems with a canonical convex minimization formulation,

min

ʲ ∈R

f

( ʲ ,ʲ 0 ) + ʨ( ʲ ).

(14.1)

p

,ʲ 0 ∈R

Here the first part f

of the objective function represents the amount of loss or

error by making incorrect predictions. The second part

( ʲ ,ʲ 0 )

is called a regularizer

or a penalty term, which is used to induce certain structure (for example, sparsity)

on the coefficient vector

ʨ( ʲ )

ʲ

.

1 http://www.geneontology.org

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home