Bayesian Variable Selection for Multi-response Linear Regression - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

Algorithm 1: Group-Wise Gibbs Sampler for Support Recovery

1. Randomly select a variable X j . Compute R j,m = Y m − i = j X i ʲ i,m ,for m =

1 ,

,M .

2. Compute the likelihood ratio Z j according to Eq. (6), and then evaluate the poste-

rior probability of ʴ j

···

ʸ j ) Z j

−

P ( ʴ j =1

| Y

,ʴ −j ,

{

ʲ −j,m ,m =1 ,

···

}

,˃ )=

(7)

ʸ j ) Z j + ʸ j

−

3. Sample ʴ j based on the posterior probability in (7). If ʴ j =0,thenset ʲ j,m =0,

m =1 ,

N ( r j,m ,˃ 2

j,m ).

4. After repeat above steps for all variables, compute the current residual matrix,

Res =

···

,M , otherwise, sample ʲ j,m ∼

, ( diaq ( Res Res )) /M + b

B . Then sample ˃ 2

IG ( a + n

Y − X

∼

).Goto

Step 1.

3.2

Two-Layer Structure and Two-Layer Gibbs Sampler

In the group selection methods, once a variable, X j , is selection, then X j is active

for all the responses, Y 1 ,...,Y M . However, we can further assume that the selected

variable might not be active for all response vectors simultaneously. In other words, we

are interested in finding the best union of support sets, S , and we also assume that the

variable in S might be inactive for some response vectors. Therefore, unlike the single

indicator set-up in the group-wise Gibbs sampler, two nested sets of binary indicator

variables are used. The first set of indicators, ʴ =( ʴ 1 ,

,ʴ p ) is associated with

variables, X 1 ,...,X p , respectively, and ʴ j is defined to indicate if the variable, X j ,

is active for any of the response vectors. Specifically if ʴ j =1, then the variable X j is

selected, and ʴ j =0otherwise. In the second indicator set, each indicator is associated

with a variable and a response vector, indicating whether this variable is active for

explaining the particular response vector. Thus for each variable X j ,wedefinethe

indicator vector ʷ ( j ) =( ʷ j, 1 ,

···

,ʷ j,M ),andif ʷ j,m =1,thevariable X j is active for

the m -th response, Y m ,and ʷ j,m =0otherwise.

Similar to the group-wise Gibbs sampler, the prior distribution of ʴ j is also assumed

to follow the Bernoulli distribution with P ( ʴ j =0)= ʸ j and P ( ʴ j =1)=1

···

−

ʸ j ,i.e.

Ber (1

ʸ j ). Consider the prior assumption for the second set of indicators. Following

Chen et al. (2014) [3], the prior distribution of the indicator in the second set, ʷ j,m ,is

chosen as a mixture distribution depended on the indicator in the first set: ʴ j ,andis

represented as

−

ʷ j,m |ʴ j ∼ (1 − ʴ j ) ʳ 0 + ʴ j Ber (1 − ˁ j,m ) ,

(8)

where P ( ʷ j,m =0)= ˁ j,m .BasedonEq.(8),ifthe j -th variable, X j , is not selected

in S ,i.e. ʴ j =0,then ʷ j,m =0for all m =1 ,...,M ,however,when ʴ j =1, ʷ j,m still

could be 0 or 1 due to the Bernoulli prior distribution. Then for the coefficient, ʲ j,m ,

given the indicators ʴ j and ʷ j,m , the prior distribution of ʲ j,m can be defined as

ʴ j ʷ j,m ) ʳ 0 + ʴ j ʷ j,m N (0 ,˄ j,m ) .

ʲ j,m |

ʴ j ,ʷ j,m ∼

−

(9)

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home