Information Technology Reference
In-Depth Information
Chapter 14
Signature Selection for Grouped Features
with a Case Study on Exon Microarrays
Sangkyun Lee
Abstract When features are grouped, it is desirable to perform feature selection
groupwise in addition to selecting individual features. It is typically the case in
data obtained by modern high-throughput genomic profiling technologies such as
exon microarrays, which measure the amount of gene expression in fine resolution.
Exons are disjoint subsequences corresponding to coding regions in genes, and exon
microarrays enable us to study the event of different usage of exons, called alterna-
tive splicing, which is presumed to contribute to development of diseases. To identify
such events, all exons that belong to a relevant gene may have to be selected, perhaps
with different weights assigned to them to detect most relevant ones. In this chapter
we discuss feature selection methods to handle grouped features. A popular shrink-
age method, lasso, and its variants will be our focus, that are based on regularized
regression with generalized linear models. Data from exon microarrays will be used
for a case study.
·
·
·
·
Keywords Penalized regression
Lasso
Group lasso
Sparsity
Convex
regularization
14.1 Introduction
Group information of features provides us a way to perform feature selection in
different resolutions. That is, not only individual features (high resolution) but also
groups comprising these features (low resolution) can be considered for selection
when they are relevant. Examples of group information include:
Population census data, where each record consists of demographic, economic and
social features. Grouped feature selection may identify that demographic features
are important for predicting years in education,
Search WWH ::




Custom Search