Information Technology Reference
In-Depth Information
It is useful to contrast the kinds of genetic predispo-
sitions that can be important for biasing neural network
learning with those that are typically discussed by na-
tivists (psychologists who emphasize the genetic con-
tributions to behavior). As emphasized by Elman et al.
(1996), most nativists think in terms of people being
born with specific knowledge or representations (e.g.,
the knowledge that solid things are generally impen-
etrable; Spelke, Breinlinger, Macomber, & Jacobson,
1992). In neural network terms, building in this kind of
specific knowledge would require a very detailed pat-
tern of weights, and is relatively implausible given how
much information the genome would have to contain
and how difficult it would be for this to be expressed
biologically using known developmental mechanisms
such as concentration gradients of various growth and
cellular adhesion factors.
In contrast, architectural biases (e.g., which areas are
generally connected with which other areas) and para-
metric biases (e.g., how fast one area learns compared
to another, or how much inhibition there is in different
areas) can presumably be relatively easily encoded in
the genome and expressed through the actions of regu-
latory factors during development. Because even subtle
differences in these biases can lead to important differ-
ences in learning, it is often not easy to characterize the
exact nature of the biological biases that shape learning.
Nevertheless, we will see that more general aspects of
the biology of networks (specifically the role of inhibi-
tion) and the biology of learning (specifically its asso-
ciative or Hebbian character) serve as important biases
in model learning.
A more pragmatic issue that arises in the context of
introducing biases into learning is that it requires more
work from the modeler. Essentially, modelers have to
replace the powerful parameter selection process of nat-
ural selection and evolution with their own hunches and
trial-and-error experience. For this reason, many people
have shied away from using more strongly biased mod-
els, favoring the use of very general purpose learning
principles instead. Unfortunately, there is a basic under-
lying tradeoff in the use of biases in learning, so there
is “no free lunch” — no way to achieve the benefits of
appropriate biases without paying the price of finding
good ones for a given task (Wolpert, 1996b, 1996a).
This tradeoff in the role of biases in learning has long
been appreciated in the statistics community, where it
goes by the name of the bias-variance dilemma (Ge-
man, Bienenstock, & Doursat, 1992). It is a dilemma
because if learners rely on their biases too strongly, they
will not learn enough from their actual experiences, and
will thus end up getting the wrong model (think of the
experienced video game player who misses the impor-
tant new keys in favor of using only the familiar ones).
On the other hand, if learners rely on their experiences
too strongly, then, assuming each learner has somewhat
idiosyncratic experiences, they will all end up with dif-
ferent models that reflect these idiosyncrasies. Hence,
there will be a lot of model variance ,which,assum-
ing there is one real underlying state of the world, also
means a lot of wrong models. Thus, the proper weight-
ing of biases versus experience is a true dilemma (trade-
off), and there really are no generally optimal solutions.
The history of scientific research contains many ex-
amples where biases were critical for seeking out and
making sense of phenomena that would have other-
wise remained incomprehensible. However, some of
the best-known examples of bias in science are negative
ones, where religious or other beliefs interfered with the
ability to understand our world as it truly is. Thus, sci-
ence provides an excellent example of the double-edged
nature of biases.
One of the most pervasive biases in science is that
favoring parsimonious explanations. We will see that
model learning can also be biased in favor of develop-
ing relatively simple, general models of the world in
several ways. The parsimony bias was espoused most
famously by William of Occam in the 1300s, for whom
the phrase Occam's razor , which cuts in favor of the
simplest explanation for a phenomenon, was named.
One of the primary practical advantages of developing
parsimonious models of the world is that this often re-
sults in greater success in generalization , which is the
application of models to novel situations. Think of it
this way: if you just memorized a bunch of specific
facts about the world instead of trying to extract the
simpler essential regularity underlying these facts, then
you would be in trouble when dealing with a novel sit-
uation where none of these specifics were relevant. For
example, if you encode a situation where a tiger leaps
Search WWH ::




Custom Search