Information Technology Reference
In-Depth Information
than just a few inputs are present, a table like that in fig-
ure 2.21 becomes intractably large due to the huge num-
ber of different unique combinations of input states. For
example, if the inputs are binary (which is not actually
true for neurons, so it's even worse), the table requires
that we need not be too concerned with the value of z .
First, we want to emphasize what has been done here.
Equation 2.33 means that input patterns d become
more probable when there is activity on input sources
that are thought to reflect the presence of something
of interest in the world, as parameterized by the weight
value w i . Thus, if w i = 1 , we care about that in-
put, but if it is 0, we don't care (because it is not rel-
evant to our hypothesis). Furthermore, the overall like-
lihood is just the (normalized) sum of all these individ-
ual source-level contributions — our detector does not
represent interactions among the inputs. The beauty of
the Bayesian framework is that it enables us to use these
definitions (or any others that we might also find plausi-
ble), to then compute in a rational manner the extent
to which we should believe a given hypothesis to be
true in the context of a particular data input. Of course,
garbage-in gives garbage-out, so the whole thing rests
on how good (plausible) the likelihood definition is.
In effect, what we have done with equation 2.33 is
provided a definition of exactly what the hypothesis h
is, by explicitly stating how likely any given input pat-
tern would be assuming this hypothesis were true. The
fact that we are defining probabilities, not measuring
them, makes these probabilities subjective. They no
longer correspond to frequencies of objectively measur-
able events in the world. Nevertheless, by working out
our equations in the previous section as if we had ob-
jective probabilities, and establishing a self-consistent
mathematical framework via Bayes formula, we are as-
sured of using our subjective probabilities in the most
“rational” way possible.
The objective world defined by the state table in fig-
ure 2.21 corresponds to the definition of the likelihood
given by equation 2.33, because the frequency (objec-
tive probability) of each input state when the hypothesis
entries for n inputs, with the extra factor of two
(accounting for the +1 in the exponent) reflecting the
fact that all possibilities must be considered twice, once
under each hypothesis. This is roughly 1:1x10 301
n +1
for
just 1,000 inputs (and our calculator gives Inf as a re-
sult if we plug in a conservative guess of 5,000 inputs
for a cortical neuron). This is the main reason why we
need to develop subjective ways of computing probabil-
ities.
As we have stated, the main way we avoid using a ta-
ble of objective probabilities is to use likelihood terms
that can be computed directly as a function of the input
data and the specification of the hypothesis , without ref-
erence to objective probabilities and the requisite table.
When we directly compute a likelihood function, we ef-
fectively make a set of assumptions about the nature of
the hypothesis and its relationship with the data, and
then compute the likelihood under these assumptions.
In general, we have no way of validating these assump-
tions (which would require the intractable table), so we
must instead evaluate the plausibility of the assumptions
and their relationship to the hypotheses.
One plausible assumption about the likelihood func-
tion for a detector is that it is directly (linearly) pro-
portional to the number of inputs that match what the
detector is trying to detect. Thus, we use a set of pa-
rameters to specify to what extent each input source is
representative of the hypothesis that something interest-
ing is “out there.” These parameters are just our stan-
dard weight parameters w . Together with the linear pro-
portionality assumption, this gives a likelihood function
that is a normalized linear function of the weighted in-
puts:
is true is proportional to the number of active inputs in
that state — this is exactly what our assumption was in
constructing equation 2.33. As you can verify yourself,
the equation for the likelihood in the objective world is
(2.33)
where d i is the value of one input source i (e.g., d i =1
if that source detected something, and 0 otherwise), and
the normalizing term z ensures that the result is a valid
probability between 0 and 1. We will see in a moment
(2.34)
where we assume that the weights for the 3 input
sources are all 1 (figure 2.23). To illustrate the impor-
Search WWH ::




Custom Search