SOME RECENT RESULTS IN MODEL SELECTION - Quantitative Medical Data Analysis Using Mathematical Tools and Statistical Techniques

Biomedical Engineering Reference

In-Depth Information

paths of problem (P1), while parameter 1 takes a range of values. More

importantly, most of these algorithms only take polynomial number of oper-

ations | i.e., they are polynomial-time algorithms. In fact, the complexity

of nding a solution path for (P1) is the same as implementing an ordinary

least square t 14 .

The main objective of this chapter is to nd when (P0) and (P1) give

the same result in the subset selection. Following Huo and Ni 28 , a subset

that corresponds to the indices of the nonzero entries of the minimizer of

(P0) (respectively, (P1)) is called a type-I (respectively, type-II) optimal

subset with respective to 0 (respectively, 1 ). A subset that is both type-I

and type-II optimal is called a concurrent optimal subset. It is known that

there exists a necessary and sucient condition for the type-II optimal

subset, and this condition can be veried in polynomial time. However,

in general, there is no polynomial-time necessary and sucient condition

for the type-I optimal subset. Therefore, we search for easy-to-verify (i.e.,

polynomial-time) sucient conditions for type-I optimal subsets. Two types

of results are derived in Huo and Ni 28 . The rst is based on the assumption

that the covariates whose correlations with the response vector are higher

than the others form the optimal subset. The second result is motivated by

a new advance in sparse signal representation, and is rather general. Since

this is not the theme of this chapter, we omit further details.

3. Formulation and Related Literature

We review more literature in subset selection. Recall in a regression setting,

2

R n are

coecient and response vectors. The columns of matrix are covariates.

A regression model is y = x + ", where " is a random vector. Let I =

f1; 2; : : : ; mgdenote all the indices of the coecients. A subset of coecients

(or, covariates) is denoted by I. Letjjdenote the cardinality of set .

Let x denote the coecient vector that only takes nonzero values when the

coecient indices are in the subset . A subset selection problem has two

competing objectives in choosing a subset : rstly, the residuals, which

are in the vector yx , are close to zeros; secondly, the size of the set

is small.

R nm (n > m) denotes a model matrix. Vectors x2

R m and y2

3.1. Subset Selection Criteria and (P0)

Rich literature can be found on the criteria regarding subset selection.

Miller 36 and George and Foster 22 provided an excellent overview. An inter-

Search WWH ::

Custom Search

Home