Information Technology Reference
In-Depth Information
Bayesian Variable Selection for Multi-response
Linear Regression
Wan-Ping Chen 1 , Ying Nian Wu 1 , and Ray-Bin Chen 2
1 Department of Statistics,
University of California, Los Angeles, CA, USA
2 Department of Statistics,
National Cheng Kung University, Tainan, Taiwan, ROC
Abstract. This paper studies the variable selection problem in high dimensional
linear regression, where there are multiple response vectors, and they share the
same or similar subsets of predictor variables to be selected from a large set of
candidate variables. In the literature, this problem is called multi-task learning,
support union recovery or simultaneous sparse coding in different contexts. In
this paper, we propose a Bayesian method for solving this problem by introducing
two nested sets of binary indicator variables. In the first set of indicator variables,
each indicator is associated with a predictor variable or a regressor, indicating
whether this variable is active for any of the response vectors. In the second set
of indicator variables, each indicator is associated with both a predicator variable
and a response vector, indicating whether this variable is active for the particular
response vector. The problem of variable selection can then be solved by sam-
pling from the posterior distributions of the two sets of indicator variables. We
develop the Gibbs sampling algorithm for posterior sampling and demonstrate
the performances of the proposed method for both simulated and real data sets.
Keywords: Multi-task learning, Support union recovery, Simultaneous sparse
coding.
1
Introduction
Variable selection is a fundamental problem in linear regression, especially in modern
applications where the number of predictor variables or regressors can exceed the num-
ber of observations. Under the sparsity assumption that the number of active variables
is small, it is possible to select these active variables even if the number of candidate
variables is very large.
During the past decade, the problem of variable selection in high dimensional linear
regression has been intensely studied in statistics, machine learning and signal pro-
cessing. Many variable selection methods have been developed, such as the Lasso by
Tibshirani (1996) [14], SCAD by Fan and Li (2001) [5], elastic net by Zou and Hastie
(2005) [19], and MCP by Zhang (2010) [18]. In addition to these penalized least squares
methods, Bayesian approaches have also been proposed, for example, stochastic search
variable selection (SSVS) by George and McCulloch (1993) [7], Gibbs variable selec-
tion (GVS) by Dellaportas et al. (2000) [4], and RVM by Tipping (2005) [15].
Search WWH ::




Custom Search