Biomedical Engineering Reference
In-Depth Information
ʳ
6.3.2.3 Analysis of
- MAP
Previously, we have claimed that the
- MAP process naturally forces excessive/
irrelevant hyperparameters to converge to zero, thereby reducing model complexity.
Note that, somewhat counterintuitively, this occurs even when a flat hyperprior is
assumed. While this observation has been verified empirically by ourselves and
others in various application settings, there has been relatively little corroborating
theoretical evidence, largely because of the difficulty in analyzing the potentially
multimodal, non-convex
ʳ
- MAP cost-function. We can then show that: Every local
minimum of the generalized
ʳ
ʳ
- MAP cost function, is achieved at a solution with
d y non-zero hyper parameters if f i i )
utmost rank
is concave and non-
decreasing for all i , including flat hyper priors. Therefore, we can be confident that
the pruning mechanism of
(
y
)
d y
- MAP is not merely an empirical phenomena. Nor is it
dependent on a particular sparse hyperprior, the result holds when a flat (uniform)
hyperprior is assumed.
The number of observation vectors n also plays an important role in shaping
ʳ
-
MAP solutions. Increasing n has two primary benefits: (i) it facilitates convergence
to the global minimum (as opposed to getting stuck in a suboptimal extrema) and (ii),
it improves the quality of this minimum by mitigating the effects of noise. Finally, a
third benefit to using n
ʳ
>
1 is that it leads to temporal smoothing of estimated time
courses (i.e., rows of ˆ
s . This occurs because the selected covariance components do
not change across time, as would be the case if a separate set of hyperparameters
were estimated at each time point. For purposes of model selection, a rigorous bound
on lo
g
(
)
can be derived using principles from convex analysis that have been
successfully applied in general-purpose probabilistic graphical models (see Wipf
and Nagarajan [ 13 ]).
p
y
6.3.3 Source MAP or Penalized Likelihood Methods
The second option is to integrate out the unknown
ʳ
, we can treat p
(
s
)
as the effective
prior and attempt to compute a MAP estimate of s via
s
ˆ
=
argmax
s
p
(
y
|
s
)
p
(
s
| ʳ)
p
(ʳ)
d
ʳ =
argmax
s
p
(
y
|
s
)
p
(
s
)
(6.22)
While it may not be immediately transparent, solving s -MAP also leads to a
shrinking and pruning of superfluous covariance components. In short, this occurs
because the hierarchical model upon which it is based leads to a convenient, iterative
EM algorithm-based implementation, which treats the hyperparameters
as hidden
data and computes their expectation for the E-step. Over the course of learning, this
expectation collapses to zero for many of the irrelevant hyperparameters, removing
them from the model in much the same way as
ʳ
ʳ
- MAP .
 
Search WWH ::




Custom Search