Information Technology Reference
In-Depth Information
this as being a categorical representation of the digits,
where the activation of each digit detector corresponds
to an entire category of possible input patterns.
Ideally, it would be useful to be able to visualize the
transformation process somehow. In the detector ex-
ploration from the previous chapter (section 2.6.3), we
could get a sense of the transformation being performed
by plotting the response profile of the unit across each
of the input patterns. Unfortunately, it is considerably
more difficult to do this for an entire hidden layer with
many units. One useful tool we have at our disposal
is a cluster plot of the similarities of different activity
patterns over the input and hidden layers.
A cluster plot recursively groups together the patterns
or groups of patterns that are most similar to each other.
Specifically, the two most similar patterns are grouped
together first, and then the similarity of this group with
respect to the other patterns is computed as the average
of the similarities of the two patterns in the group. The
grouping process continues by combining the next two
most similar patterns (or groups) with each other, and so
on until everything has been put into groups (clusters).
The similarity between patterns is typically com-
puted using Euclidean distance:
that denotes the grouping. These groups are the criti-
cal things to focus on for understanding how the net-
work represents the digits. Note that the Y axis here
is just an index across the different patterns and does
not convey any meaningful information (e.g., one could
randomly permute the orders of items within a group
without changing the meaning of the plot). The X axis
shows distance, with the distance among items within a
cluster represented by the length of the horizontal line
along the X axis coming out from their common vertical
grouping line. Note also that the X axis is auto-scaled ,
so be sure to look at the actual values there (don't as-
sume that all X axes across different plots are the same
length).
Figure 3.8a shows a cluster plot of input patterns
where there are three noisy versions of each digit (for
example, the first three patterns in figure 3.7 were the
inputs for the digit 8). There are two important features
of this plot. First, you can see that the noisy versions
of each digit are all clustered in groups, indicating that
they are more similar to each other than to any other
digit. Second, the fact that the different digits form
an elaborate hierarchy of cluster groups indicates that
there is a relatively complex set of similarity relation-
ships among the different digits as represented in their
input images.
Figure 3.8b shows a cluster plot of the hidden unit
activation patterns for the same set of input images.
Two things have happened here. First, the distinctions
among the different noisy versions of the same digit
have been collapsed (deemphasized) — the fact that the
digit labels are flush against the vertical bar depicting
the cluster group means that there is zero distance be-
tween the elements of the group (i.e., they all have an
identical representation in the hidden layer). Second,
the complex hierarchical structure of similarities among
the different digits has been eliminated in favor of a very
uniform similarity structure where each digit is equally
distinct from every other. One can think of this as em-
phasizing the fact that each digit is equally distinct from
the others, and that all of the digits form a single group
of equals.
A conceptual sketch of the transformation performed
by the digit network is shown in figure 3.9, which is in-
tended to roughly capture the similarity structure shown
(3.1)
where x i is the value of element i in one pattern, and
i is the value of this same element in another pattern.
The data necessary to produce a cluster plot is contained
in a distance matrix , which represents all pairwise dis-
tances between patterns as the elements of the matrix.
The advantage of a cluster plot over the raw distance
matrix is that the visual form of the cluster plot of-
ten makes the similarity information more easy to see.
Nevertheless, because the cluster plot reduces the high-
dimensionality of the hidden unit representation into a
two-dimensional figure, information is necessarily lost,
so sometimes it is useful to also use the distance matrix
itself.
Figure 3.8 shows two cluster plots that illustrate how
the digit network transformation emphasizes some dis-
tinctions and deemphasizes others. The different groups
are shown as horizontal leaves offofavertical branch
Search WWH ::




Custom Search