Information Technology Reference
In-Depth Information
dimensional world onto a much lower-dimensional sen-
sory organ. A projection is a mathematical term for
transforming a higher-dimensional matrix onto a lower
dimensional one — think of Plato's cave and the real
world projecting shadows on the wall.
Inverting these projected world states back into
something resembling the world is difficult because so
much information is lost in this projection down onto
our senses. Marr (1982) characterized this situation by
saying that vision is an ill-posed problem, in that the in-
put data do not sufficiently constrain our interpretation
of it — it is difficult to decide among the large number
of possible internal models that could fit the sensory in-
put data equally well. Put another way, it is difficult
to know what the real underlying causes of our percep-
tions are, and what are mere coincidences or appear-
ances (i.e., noise).
The model learning problem can be made easier by
integrating across many individual experiences. Thus,
although any single observation may be ambiguous and
noisy, over time, the truth will eventually shine through!
In science, this is known as reproducibility — we only
believe in phenomena that can be reliably demonstrated
across many different individual experiments in differ-
ent labs. The law of large numbers also says that noise
can be averaged away by integrating statistics over a
large sample. We will see that this integration process
is critical to successful model learning, and that it is
naturally performed by slowly adding up small weight
changes so that the resulting weights represent aggre-
gate statistics over a large sample of experiences. Thus,
the network ends up representing the stable patterns that
emerge over a wide range of experiences with the world.
However, just aggregating over many experiences is
not enough to enable the development of a good inter-
nal model. For example, if you just averaged over all
the pixel intensities of the images experienced by the
retina, you would just get a big gray wash. Thus, one
also needs to have some prior expectations or biases
about what kinds of patterns are particularly informa-
tive, and how to organize and structure representations
in a way that makes sense given the general structure
of the world. If these biases provide a reasonable fit to
the properties of the actual world, then model learning
becomes easier and more reliable across individuals.
For example, let's imagine that you were faced with
the task of learning to control a complex new video
game using the keyboard, with no explicit help avail-
able. If you had grown up playing video games, you
would know in advance what kinds of actions a video
game protagonist generally makes, and which sets of
keys are likely to control these actions. In other words,
you would have a set of appropriate a priori biases to
structure your internal model of video games. You
might have some uncertainty about some of the details,
but a few quick experiments will have you playing in
no time. In contrast, a complete video game novice will
have to resort to trying all the keys on the keyboard,
observing the corresponding responses of the protago-
nist, and slowly compiling an inventory. If there are
any interdependencies between these actions (e.g., you
can only “kick” after the “jump” button was pressed),
then the task becomes that much more difficult, be-
cause the space of possible such contingencies is even
greater. Thus, appropriate biases can make model learn-
ing much easier. However, it is essential to keep in
mind that if your biases are wrong (e.g., the video game
is completely unlike any other), then learning can take
even longer than without these biases (e.g., you persist
in trying a subset of keys, while the really important
ones are never tried).
In this video game example, the biases came from
prior experience and can be attributed to the same kinds
of learning processes as the novice would use. Al-
though this type of biasing happens in our networks
too, understanding the kinds of biases that are present
from the very start of learning presents more of a chal-
lenge. We assume that evolution has built up over mil-
lions of years a good set of biases for the human brain
that facilitate its ability to learn about the world. This
essential role for genetic predispositions in learning is
often under-emphasized by both proponents and crit-
ics of neural network learning algorithms, which are
too often characterized as completely tabula rasa (blank
slate) learning systems. Instead, these learning mech-
anisms reflect a dynamic interplay between genetic bi-
ases and experience-based learning. The difficulty is
that the genetic-level biases are not typically very obvi-
ous or easily describable, and so are often overlooked.
Search WWH ::




Custom Search