Information Technology Reference
In-Depth Information
WordMatrix Pattern: 0
a hidden layer representation that best fits this combi-
nation of words. Thus, novel semantic representations
can be produced as combinations of semantic represen-
tations for individual words. This ability is critical for
some of the more interesting and powerful applications
of these semantic representations (e.g., multiple choice
question answering, essay grading, etc.).
The ActProbe function can be used to do this
activation-based probing of the semantic representa-
tions.
Y
binding
2.20
1.80
attention
1.40
1.00
0.60
dyslexia
X
Select act in the network window. Press
ActProbe on the sem_ctrl control panel, and you will
be prompted for two sets of words. Let's start with the
same example we have used before, entering “attention”
for the first word set, and “binding” for the second.
You should see the network activations updating as
the word inputs are presented. The result pops up in a
window, showing the cosine between the hidden acti-
vation patterns for the two sets of words. Notice that
this cosine is lower than that produced by the weight-
based analysis of the WordMatrix function. This can
happen due to the activation dynamics, which can ei-
ther magnify or minimize the differences present in the
weights.
Next, let'suse ActProbe to see how we can sway
an otherwise somewhat ambiguous term to be inter-
preted in a particular way. For example, the term “atten-
tion” can be used in two somewhat different contexts.
One context concerns the implementational aspects of
attention, most closely associated with “competition.”
Another context concerns the use of attention to solve
the binding problem, that is associated with “invari-
ant object recognition.” Let's begin this exploration by
first establishing the baseline association between “at-
tention” and “invariant object recognition.”
0.00
0.50
1.00
1.50
, !
Figure 10.25: Cluster plot of the similarity structure for at-
tention, binding, and dyslexia as produced by the WordMa-
trix function.
You should replicate the same basic effect we saw
above — attention and binding are more closely related
to each other than they are to dyslexia. This can be seen
in the cluster plot (figure 10.25) by noting that attention
and binding are clustered together. The cosine matrix
appears in the terminal window where you started the
program (it should look like table 10.12). Here, you can
see that the cosine between “attention” and “binding”
is .415, (relatively high), while that of “attention” and
“dyslexia” is only .090, and that between “binding” and
“dyslexia” is only .118.
Do a WordMatrix for several other words that the
network should know about from “reading” this textbook.
, !
Question 10.11 (a) Report the cluster plot and cosine
matrix results. (b) Comment on how well this matches
your intuitive semantics from having read this textbook
yourself.
Do an ActProbe with “attention” as the first word
set, and “invariant object recognition” as the second.
You should get a cosine of around .302. Now, let's
see if adding “binding” in addition to “attention” in-
creases the hidden layer similarity.
, !
Distributed Representations via Activity Patterns
To this point we have only used the patterns of weights
to the hidden units to determine how similar the seman-
tic representations of different words are. We can also
use the actual pattern of activation produced over the
hidden layer as a measure of semantic similarity. This is
important because it allows us to present multiple word
inputs at the same time, and have the network choose
Do an ActProbe with “attention binding” as the first
word set, and “invariant object recognition” again as the
second.
The similarity does indeed increase, producing a co-
sine of around .326. To make sure that there is an in-
Search WWH ::




Custom Search