Information Technology Reference
In-Depth Information
tent and has higher energy. A system that has satisfied
the constraints is at a lower state of energy, and nature
always tries to settle into a lower energy state by satis-
fying more constraints.
A simple example of an energy function is a Eu-
clidean (sum of squares) distance function between two
objects ( a and b ) that don't like to be separated (e.g.,
two magnets with opposite poles facing each other). An
energy function for such a system would be:
mathematically deriving a biologically plausible form
of error-driven learning. As we will see, only a very
rough, not exact, form of symmetry is required.
The constraints are represented in this function as the
extent to which the activations are consistent with the
weights . Thus, if there is a large weight between two
units, and these units are strongly active, they will con-
tribute to a smaller energy value (because of the minus
sign). This opposition of the magnitude and the sign of
the energy term is confusing, so we will use the negative
value of the energy function, which is called harmony
(Smolensky, 1986):
(3.8)
in two dimensions with x and y coordinates. As the two
objects get closer together, the distance value obviously
gets smaller, meaning that the constraint of having the
objects closer together gets more satisfied according to
this energy function. As we will see, energy functions
often have this “squared” form of the distance function.
In a network of neurons, a state that has satisfied
more constraints can also be thought of as having a
lower energy. When we apply the mathematics of these
energy functions to networks, we find that the simple
act of updating the activations of the units in the net-
work results in the same kind of settling into a lower
energy state by satisfying more constraints. The stan-
dard form of the network energy function is as follows:
(3.10)
Here, two units are said to contribute to greater harmony
(also known as lower energy) if they are strongly active
and connected by a large weight. In this terminology,
the network settling acts to increase the overall harmony
of the activation states.
To see how this occurs, let's take the simplest case of
a linear unit that computes its activation as follows:
(3.11)
What we want to do is to show that updating the activa-
tion according to this equation is equivalent to maximiz-
ing harmony in the network. One standard way to max-
imize an equation is to figure out how its value changes
as a result of changes to the constituent variables. This
requires taking the derivative of the function with re-
spect to the variable in question (activation x j in this
case). We will discuss this process of maximizing or
minimizing functions using derivatives in greater detail
in chapter 5, so if you don't understand it, take it on
faith now, and you can come back to this point after you
have read that chapter (where it plays a very central role
in developing one of our learning mechanisms).
If we take the derivative of the harmony equation
(3.10) with respect to one unit's activation value ( x j ),
we get:
(3.9)
where x i and x j represent the sending and receiving
unit activations, respectively, and w ij is the weight con-
necting them. Note that each unit in the network ap-
pears as both a sender and a receiver in the double-sum
term — it is this double-counting that motivates the
factor in the equation, as we will see.
It is important that the weights connecting two units
be at least roughly symmetric , so that the influences of
one unit are reciprocated by those of the other unit. This
symmetry enables the network to find a single consis-
tent state that satisfies all of the units — if the weights
were not symmetric, then one unit could be “satisfied”
but the other “unsatisfied” with the same state, which
would obviously not lead to a consistent global state.
We will discuss the biological basis for weight sym-
metry in chapter 5, where it plays an important role in
(3.12)
Search WWH ::




Custom Search