Geography Reference
In-Depth Information
because a 400-cluster solution for 51 objects is not very useful. Instead, the SOM allows
detailed two-dimensional layout of the geographic objects. Those states still assigned to a
single neuron (e.g. Louisiana (LA) and Mississippi (MS)) are too similar to be distinguishable
even at this level of granularity.
The detail provided in high-resolution SOMs makes it possible to have them play the role
of a base map onto which various other data could be mapped. This is particularly true due
to the fact that a SOM does not directly represent the input vectors as such, in contrast to
such methods as multidimensional scaling (MDS) or spring models. Instead it creates a low-
dimensional output model of the n -dimensional input space. That model can be applied to
other data, as long as they have the same dimensionality. Once those data are mapped onto
the SOM, other features can be attached. For example, if a SOM is constructed from multi-
temporal demographic attributes of geographic objects, one could link individual temporal
vertices to form trajectories and then visualize previously unrelated attributes onto those
trajectories (Skupin and Hagelman, 2005). From clustering to labelling of neuron regions, a
number of transformations have been proposed that all depend on a view of high-resolution
SOMs as base maps (Skupin, 2004).
Most SOM software solutions provide limited support for effectively storing and trans-
forming large neuron lattices and derived data, such as trajectories and surfaces. One al-
ternative is to leverage the ability of GIS to deal with large, low-dimensional geometric
data sets. Within GIS one can first choose among the various geometric data models, have
access to various database solutions and perform a wide array of transformations, from
interpolation to overlay operations. Finally, in the hands of a cartographer, GIS can produce
attractive visualizations with a large degree of automation (for example for the complex task
of feature labelling), while still performing data-driven visualization. Use of GIS can thus
make high-resolution SOMs a much less daunting proposition on many levels.
8.3.2 Examples of high-resolution SOMs
Most SOM implementations are based on lattices of no more than a few hundred neurons,
and typically much less than that. A few examples for large SOMs exist though. Most of these
were in fact created by the research group around the method's inventor, Teuvo Kohonen.
In the mid-1990s they mapped more than 130 000 newsgroup postings onto a SOM that
eventually consisted of 49 152 neurons, though in a two-stage process that began with a
much smaller SOM of 768 units, from which the larger SOM was interpolated and further
training was then applied (Kohonen et al. , 1996b). By far the largest SOM known was
created from the text of almost 7 million patent applications (Kohonen, 2001). Training was
a three-step process, during which progressively finer SOMs were created, beginning with
a 435-neuron SOM and eventually leading to a model consisting of 1 002 240 map units.
Training took 6 weeks on a six-processor computer system.
Training speed is not merely a function of the number of neurons, but also of the
model's dimensionality. Text documents tend to be represented with much longer vec-
tors than other data. The demographic data visualized in Figures 8.1-8.3 includes 32 at-
tributes, while Skupin's visualization of AAG conference abstracts represented each abstract
as a 741-dimensional vector (Skupin, 2002). At that time, training of a 4800-neuron SOM
with the conference abstracts took 3 hours. Training of a much higher-resolution, yet very
Search WWH ::




Custom Search