Geoscience Reference
In-Depth Information
of statistics and machine learning. These clustering algorithms can be broadly
classified by the paradigm they use. One of the most prominent and widely used
clustering paradigms is partitioning clustering. Partitioning clustering algorithms,
such as k-Means or neural gas (NG; Martinetz and Schulten 1991 ), divide a set
of observations into a nonoverlapping set of clusters. Each observation is assigned
to the cluster which it is closest to. For large data sets, partitioning clustering
algorithms are typically more computationally effective than, e.g., hierarchical
clustering algorithms (Jain et al. 1999 ). However, a severe disadvantage of them
is that they require the analyst to choose the number of desired clusters beforehand.
There are several important special cases of clustering. One such case is spatial
clustering, which deals with the clustering of spatially located observations. A basic
property of such observations is that they are likely to be spatially dependent. Spatial
dependence states that observations that are spatially located close to each other
tend to have similar characteristics. This property is essential to spatial sciences
because without it variation of phenomena would be independent of location and
thus the notion of region would be totally meaningless (Goodchild 1986 ). The
presence of spatial dependence has been traditionally regarded as problematic
for statistical analysis, which typically requires sample independence (Bailey and
Gatrell 1995 ). However, it can also serve as a valuable source of information about
spatial processes, because it provides evidence of causality (Miller 2004 ). Therefore,
it is generally useful for spatial clustering algorithms to take spatial dependence into
account in order to utilize the full range of available information for discovering
spatial patterns.
Spatial clustering is of special importance for spatial planning tasks: Adminis-
trative areas typically have their roots in historic administrative divisions of space,
which disregard the nonspatial characteristics of place. As a consequence, admin-
istrative divisions often intersect contiguous regions and are often inhomogeneous.
Decisions made concerning the planning, distribution, and allocation of resources
among such administrative areas are likely to be ineffective and meaningless
(Amedeo 1969 ). In fact, it has been shown by Van Der Laan and Schalke ( 2001 )
that local policies are more effective for homogeneous regions. These concerns
are very closely related to the modifiable areal unit problem (MAUP; Openshaw
1984 ). Spatial analysis typically requires manageable discrete descriptions of spatial
processes, which are continuous. For this purpose, it is necessary to aggregate
observations over areal units. The MAUP states that the outline of these units and the
scale of aggregation critically affect the results of any spatial analysis. In general, it
is useful if the observations that are aggregated over the same areal unit are similar
to each other. Consequently, since spatial clustering outlines mostly coherent and
homogeneous areas, it has potential to serve as a valuable tool for spatial planning
and analysis tasks (e.g., Helbich et al. 2013 ).
Various spatial clustering algorithms have been developed in the past (see,
e.g., Han et al. 2001 ). Most of these methods are based on general-purpose
clustering algorithms that have limited capabilities in recognizing spatial patterns
that involve neighbors or cannot deal with high-dimensional data sets (Guo et al.
2003 ). Contextual neural gas (CNG; Hagenauer and Helbich 2013 ) is a recently
Search WWH ::




Custom Search