Information Technology Reference
In-Depth Information
8.6.4 Sparse Bayesian Learning (SBL) and Automatic Relevance
Determination (ARD)
Sparse Bayesian learning (SBL) uses the same Gaussian likelihood model defined
in (8.8) and also uses a hierarchical Bayes formulation similar to that explained for
MAP estimation, but instead of integrating out the hyperparameters as in parameter
MAP estimation, in SBL we integrate out the parameters [45, 44, 55, 89, 100, 79,
69, 99, 101, 60]. Thus, instead of finding point estimates at the posterior modes us-
ing fixed priors, it performs the evidence maximization procedure to learn adaptive
hyperparameters from the data itself. SBL assumes an automatic relevance determi-
nation (ARD) prior for the current density defined as
d
α
i = 1 N ( 0 , α 1
p
(
J
| α )=
I
) ,
(8.27)
i
where
is a vector of hyperparameters or precisions (i.e., inverse source variances),
d α is the number of hyperparameters, and each J i : has a zero-mean Gaussian prior
with covariance
α
α 1
i
I . The inverse source and noise variances have Gamma hyper-
priors,
d
α
i = 1 Gamma ( α i | a , b ) ,
p
( α )=
(8.28)
( σ 2
( σ 2
p
ϒ )=
Gamma
ϒ |
c
,
d
) ,
(8.29)
where a , b , c , and d are the degrees of freedom parameters of the Gamma dis-
tributions of
σ 2
ϒ
) 1 b a
a
1 e b α with
α
and
given by Gamma
( α |
a
,
b
)= Γ (
a
α
0 t a 1 e t dt . The Gamma hyperprior results in a student-t prior for the
source parameters. However, to avoid tuning the hyperprior, the Gamma distribu-
tion parameters can be set to a small number (e.g., a
Γ (
a
)=
10 4 )tomake
these priors noninformative (i.e., flat in log space, as is common for scale parame-
ters), or they can be made exactly zero, in which case we obtain the Jeffreys prior,
which results in scale invariance.
SBL is an important alternative because the posterior mode may not be repre-
sentative of the full posterior, and thus, a better point estimate may be obtained,
the posterior mean, by tracking the posterior probability mass. In the case of the
Jeffreys prior, this is achieved by finding the maximum likelihood hyperparameters
α ( ml )
=
=
=
=
b
c
d
2
(
ml
)
that maximize a tractable Gaussian approximation of the evidence
of the hyperparameters, also known as the type-II likelihood or marginal likelihood
and
σ
ϒ
p
2
(
ml
)
α ( ml ) ,
2
ϒ )=
2
ϒ )
ˆ
ˆ
ˆ
σ
=
arg max
α , σ
p
(
B
| α , σ
(
B
|
J
)
p
(
J
| α , σ
dJ
= N (
0
,
Σ B ) ,
(8.30)
ϒ
2
ϒ
or equivalently by minimizing the negative log marginal likelihood
Search WWH ::




Custom Search