Networks of Neurons - Computational Explorations in Cognitive Neuroscience

Information Technology Reference

In-Depth Information

this as being a categorical representation of the digits,

where the activation of each digit detector corresponds

to an entire category of possible input patterns.

Ideally, it would be useful to be able to visualize the

transformation process somehow. In the detector ex-

ploration from the previous chapter (section 2.6.3), we

could get a sense of the transformation being performed

by plotting the response profile of the unit across each

of the input patterns. Unfortunately, it is considerably

more difficult to do this for an entire hidden layer with

many units. One useful tool we have at our disposal

is a cluster plot of the similarities of different activity

patterns over the input and hidden layers.

A cluster plot recursively groups together the patterns

or groups of patterns that are most similar to each other.

Specifically, the two most similar patterns are grouped

together first, and then the similarity of this group with

respect to the other patterns is computed as the average

of the similarities of the two patterns in the group. The

grouping process continues by combining the next two

most similar patterns (or groups) with each other, and so

on until everything has been put into groups (clusters).

The similarity between patterns is typically com-

puted using Euclidean distance:

that denotes the grouping. These groups are the criti-

cal things to focus on for understanding how the net-

work represents the digits. Note that the Y axis here

is just an index across the different patterns and does

not convey any meaningful information (e.g., one could

randomly permute the orders of items within a group

without changing the meaning of the plot). The X axis

shows distance, with the distance among items within a

cluster represented by the length of the horizontal line

along the X axis coming out from their common vertical

grouping line. Note also that the X axis is auto-scaled ,

so be sure to look at the actual values there (don't as-

sume that all X axes across different plots are the same

length).

Figure 3.8a shows a cluster plot of input patterns

where there are three noisy versions of each digit (for

example, the first three patterns in figure 3.7 were the

inputs for the digit 8). There are two important features

of this plot. First, you can see that the noisy versions

of each digit are all clustered in groups, indicating that

they are more similar to each other than to any other

digit. Second, the fact that the different digits form

an elaborate hierarchy of cluster groups indicates that

there is a relatively complex set of similarity relation-

ships among the different digits as represented in their

input images.

Figure 3.8b shows a cluster plot of the hidden unit

activation patterns for the same set of input images.

Two things have happened here. First, the distinctions

among the different noisy versions of the same digit

have been collapsed (deemphasized) — the fact that the

digit labels are flush against the vertical bar depicting

the cluster group means that there is zero distance be-

tween the elements of the group (i.e., they all have an

identical representation in the hidden layer). Second,

the complex hierarchical structure of similarities among

the different digits has been eliminated in favor of a very

uniform similarity structure where each digit is equally

distinct from every other. One can think of this as em-

phasizing the fact that each digit is equally distinct from

the others, and that all of the digits form a single group

of equals.

A conceptual sketch of the transformation performed

by the digit network is shown in figure 3.9, which is in-

tended to roughly capture the similarity structure shown

(3.1)

where x i is the value of element i in one pattern, and

i is the value of this same element in another pattern.

The data necessary to produce a cluster plot is contained

in a distance matrix , which represents all pairwise dis-

tances between patterns as the elements of the matrix.

The advantage of a cluster plot over the raw distance

matrix is that the visual form of the cluster plot of-

ten makes the similarity information more easy to see.

Nevertheless, because the cluster plot reduces the high-

dimensionality of the hidden unit representation into a

two-dimensional figure, information is necessarily lost,

so sometimes it is useful to also use the distance matrix

itself.

Figure 3.8 shows two cluster plots that illustrate how

the digit network transformation emphasizes some dis-

tinctions and deemphasizes others. The different groups

are shown as horizontal leaves offofavertical branch

Computational Explorations in Cognitive Neuroscience

Search WWH ::

Custom Search

Home