Biomedical Engineering Reference
In-Depth Information
(
|
)
∝
(
|
)
(
),
p
x
y
p
y
x
p
x
(B.3)
where the notation
∝
means that both sides are equal, ignoring a multiplicative
constant.
Bayesian inference uses Bayes' rule in Eq. (
B.2
)or(
B.3
) to estimate
x
. A problem
here is how to determine the prior probability distribution
p
. A general strategy
for this problem is to determine the prior probability by taking what we know about
the unknown parameters
x
into account. However, if there is no prior knowledge on
x
, we must use the uniform prior distribution, i.e.,
(
x
)
p
(
x
)
=
constant
.
(B.4)
With Eq. (
B.4
), Bayes' rule in Eq. (
B.3
) becomes
p
(
x
|
y
)
∝
p
(
y
|
x
).
(B.5)
In this case, the posterior probability
p
, resulting
in a situation that Bayesian and maximum likelihood methods give the same solution.
The prior distribution in Eq. (
B.4
) is referred to as the non-informative prior.
Even when some prior information on the unknown parameter
x
is available, exact
probability distributions are generally difficult to determine. Therefore, the proba-
bility distribution is usually determined according to the convenience in computing
the posterior distribution. Some probability distributions have the same forms as
the prior and posterior distributions for given
p
(
x
|
y
)
is equal to the likelihood
p
(
y
|
x
)
(
|
)
. One representative example of
such distributions is the Gaussian distribution. That is, if the noise
y
x
is Gaussian, a
Gaussian prior distribution gives a Gaussian posterior distribution. The derivation of
the posterior distribution in the Gaussian model is explained in Sect.
B.3
.
ʵ
B.2
Point Estimate of Unknown
x
In Bayesian inference, the unknown parameter
x
is estimated based on the posterior
probability
p
(
x
|
y
)
. Then, how can we obtain the optimum estimate
x
based on
p
(
x
|
y
)
?
There are two ways to compute the estimate
x
based on a given posterior distribution.
One way chooses the
x
that maximizes the posterior, i.e.,
x
=
argmax
x
p
(
x
|
y
).
(B.6)
This
x
is called the maximum a posteriori (MAP) estimate.
The other way is to choose
x
that minimizes the squared error between the estimate
x
and the true value
x
. The squared error is expressed as
E
)
, and
T
x
−
x
)
(
x
−
x
the estimate
x
is obtained using