Geography Reference
In-Depth Information
Using SOM PAK , initial weights of neurons were set randomly and training then pro-
ceeded in two stages. During the first stage, 20 000 training runs were performed, with an
initial neighbourhood size of 250. This stage serves to represent major, 'global' structures
among the input data. The second stage then aims at shaping the representation of regional
and local structures. It consisted of 100 000 training runs, with a starting neighbourhood
size of 25 neurons. Training took 47 minutes for the first stage and 313 minutes for the
second stage (wall clock time), on a single-CPU 2.3 GHz Xeon processor system.
In retrospect, given the number of neurons and the number of input vectors, a larger
number of runs might have been preferable, as seen in some of the visualizations below.
However, one important lesson to be conveyed in this chapter is that, in order to visualize
n -dimensional data, one will often proceed in an iterative manner, especially when it comes
to detecting anomalies in the data and for setting training parameters. Experience shows that
visualization is in fact not only uniquely suited to generating knowledge about the mapped
domain as such, but it is also a powerful tool for testing and refining visualization methods
and data. For example, note below the discussion on the effects of the wind speed variable
on the trained SOM.
SOM PAK was also used to 'map' input vectors onto the trained SOM. Every block group
climate vector was compared with all neuron vectors to find the most similar neuron.
SOM PAK thus produced two output files. One is the trained SOM, also known as the
codebook file and consisting of a list of all neurons and their final weights for all variables.
The other output contains information about the best-matching neuron found for each
input vector.
While SOM PAK has only rudimentary visualization capability in the form of PostScript
output, its codebook format has become a standard read by a number of SOM software
solutions, including the SOM Toolbox for Matlab and Viscovery SOMine , where one can then
perform such operations as display of individual component planes, U-matrix, and some-
times limited clustering. However, there tends to be virtually no user control over design
specifics, like colour schemes and other symbol choices, and the display of features mapped
onto the SOM is extremely limited and virtually useless when dealing with large numbers
of features. This lack of control over the visualization is arguably the main reason for the
widespread uniformity and lack of visual appeal associated with most SOM-based visual-
izations described in the literature, with output from the SOM Toolbox being particularly
prevalent. On the other hand, transformation of SOM output into a form agreeable with
standard GIS allows tight control over visual appearance and has the added advantage -
compared with most graphic design software - of still being data-driven and thus quickly
adaptable to SOMs of different size and type. We decided to convert the codebook file of
neuron vectors into the ESRI Shapefile format, with neurons represented as hexagonal poly-
gons and neuron weights for all attributes placed in the associated dbase file. SOM PAK 's
initial output of the best-matching neuron for each input vector was transformed into a
Shapefile containing a unique point location for each vector, based on random placement
inside the respective best matching neuron.
8.4.3 Visualizing the SOM
The use of climate data in this experiment is specifically aimed at observing and under-
standing issues arising with a high-resolution SOM. This must precede future studies, in
Search WWH ::




Custom Search