Information Technology Reference
In-Depth Information
Assuming that we have a likelihood function that can
be computed directly, we would like to be able to write
equation 2.23 in terms of these likelihood functions.
The following algebraic steps take us there. First, we
note that the definition of the likelihood (equation 2.25)
gives us a new way of expressing the joint probability
term that appears in equation 2.23:
As we did before to get Bayes formula, these joint
probabilities can be turned into conditional probabilities
with some simple algebra on the conditional probability
definition (equation 2.23), giving us the following:
(2.30)
which can then be substituted into Bayes formula, re-
sulting in:
(2.27)
which can be substituted back into equation 2.23, giv-
ing:
(2.31)
This is now an expression that is strictly in terms of just
the likelihoods and priors for the two hypotheses! In-
deed, this is the equation that we showed at the out-
set ( equation 2 .2 2), w ith f(h; d) = P (djh)P (h) and
(2.28)
This last equation is known as Bayes formula ,and
it provides the starting point for a whole field known
as Bayesian statistics. It allows you to write P (hjd) ,
which is called the posterior in Bayesian terminology,
in terms of the likelihood times the prior , which is what
form,
which reflects a balancing of the likelihood in favor of
the hypothesis with that against it. It is this form that
the biological properties of the neuron implement, as
we will see more explicitly in a subsequent section.
Before continuing, let's verify that equation 2.31 pro-
duces the same result as equation 2.23 for the case we
have been considering all along ( P (h =1jd = 110) ).
First, we know that the likelihood P (d = 110jh = 1)
according to the ta bl e is (2/24) / (12/24) or .167. Also,
. It has a very simple
h + h
is called. The prior basically indicates how likely
the hypothesis is to be true without having seen any data
at all — some hypotheses are just more plausible (true
more often) than others, and this can be reflected in this
term. Priors are often used to favor simpler hypotheses
as more likely, but this is not necessary. In our appli-
cation here, the prior terms will end up being constants,
which can actually be measured (at least approximately)
from the underlying biology.
As in equation 2.23, the likelihood times the prior is
normalized by the probability of the data P (d) in Bayes
formula. We can replace P (d) with an expression in-
volving only likelihood a nd prior terms if we make use
of our null hypothesis h . Again, we want to use like-
lihood terms because they can often be computed di-
rectly. Because our hypothesis and null hypothesis are
mutually exclusive and sum to 1, we can write the prob-
ability of the data in terms of the part of it that overlaps
with the hypothesis plus the part that overlaps with the
null hypothesis:
,and P (h)=:5 as well. The only other thing
we need is P (djh) , which we can see from the table is
(1/24)/(12/24) or .083. The result is thus:
(2.32)
So, we can see that this result agrees with the previously
computed value. Obviously, if you have the table, this
seems like a rather indirect way of computing things,
but we will see in the next section how the likelihood
terms can be computed without a table.
(2.29)
2.7.2
Subjective Probabilities
In figure 2.21, this amounts to computing P (d) in the
top and bottom halves separately, and then adding these
results to get the overall result.
Everything we just did was quite straightforward be-
cause we had a world state table, and could therefore
compute objective probabilities. However, when more
Search WWH ::




Custom Search