Towards High-resolution Self-organizing Maps of Geographic Features - Geographic Visualization: Concepts, Tools and Applications

Geography Reference

In-Depth Information

Using SOM PAK , initial weights of neurons were set randomly and training then pro-

ceeded in two stages. During the first stage, 20 000 training runs were performed, with an

initial neighbourhood size of 250. This stage serves to represent major, 'global' structures

among the input data. The second stage then aims at shaping the representation of regional

and local structures. It consisted of 100 000 training runs, with a starting neighbourhood

size of 25 neurons. Training took 47 minutes for the first stage and 313 minutes for the

second stage (wall clock time), on a single-CPU 2.3 GHz Xeon processor system.

In retrospect, given the number of neurons and the number of input vectors, a larger

number of runs might have been preferable, as seen in some of the visualizations below.

However, one important lesson to be conveyed in this chapter is that, in order to visualize

n -dimensional data, one will often proceed in an iterative manner, especially when it comes

to detecting anomalies in the data and for setting training parameters. Experience shows that

visualization is in fact not only uniquely suited to generating knowledge about the mapped

domain as such, but it is also a powerful tool for testing and refining visualization methods

and data. For example, note below the discussion on the effects of the wind speed variable

on the trained SOM.

SOM PAK was also used to 'map' input vectors onto the trained SOM. Every block group

climate vector was compared with all neuron vectors to find the most similar neuron.

SOM PAK thus produced two output files. One is the trained SOM, also known as the

codebook file and consisting of a list of all neurons and their final weights for all variables.

The other output contains information about the best-matching neuron found for each

input vector.

While SOM PAK has only rudimentary visualization capability in the form of PostScript

output, its codebook format has become a standard read by a number of SOM software

solutions, including the SOM Toolbox for Matlab and Viscovery SOMine , where one can then

perform such operations as display of individual component planes, U-matrix, and some-

times limited clustering. However, there tends to be virtually no user control over design

specifics, like colour schemes and other symbol choices, and the display of features mapped

onto the SOM is extremely limited and virtually useless when dealing with large numbers

of features. This lack of control over the visualization is arguably the main reason for the

widespread uniformity and lack of visual appeal associated with most SOM-based visual-

izations described in the literature, with output from the SOM Toolbox being particularly

prevalent. On the other hand, transformation of SOM output into a form agreeable with

standard GIS allows tight control over visual appearance and has the added advantage -

compared with most graphic design software - of still being data-driven and thus quickly

adaptable to SOMs of different size and type. We decided to convert the codebook file of

neuron vectors into the ESRI Shapefile format, with neurons represented as hexagonal poly-

gons and neuron weights for all attributes placed in the associated dbase file. SOM PAK 's

initial output of the best-matching neuron for each input vector was transformed into a

Shapefile containing a unique point location for each vector, based on random placement

inside the respective best matching neuron.

8.4.3 Visualizing the SOM

The use of climate data in this experiment is specifically aimed at observing and under-

standing issues arising with a high-resolution SOM. This must precede future studies, in

Geographic Visualization: Concepts, Tools and Applications

Search WWH ::

Custom Search

Home