Geoscience Reference
In-Depth Information
of (spatial-temporal) data mining, in particular knowledge discovery in databases
(KDD), provide one such approach to automated knowledge extraction (Streich
2009 , p. 252). This chapter has shown how methods of data mining and knowledge
discovery can be applied to a spatial data set, in this particular case, 111 urban
districts in Germany classified by seven dimensions of land use.
Any evaluation on this basis can best be realized through an interdisciplinary
collaboration of computer science (data mining) and the spatial sciences. In the
example given here, the raw data was comprehensively reviewed and pre-processed
to allow the application of methods of data mining and knowledge discovery. A
quantifiable measure was determined to enable the multidimensional comparison of
urban districts. The data was structured using projection, clustering, and machine
learning algorithms. By visualizing results, it was possible to discern prominent
structures and characteristics of the data set, such as correlations, spatial outliers,
and potential clusters (groups). The automated clustering led to the discovery of
clusters of urban districts sharing common features. Representative districts were
selected from each cluster. Any classification of data should, ideally, also support
spatial planning decisions. We can speak of knowledge discovery if the extracted
knowledge is entirely new and is nonobvious and also if this knowledge can be
of use at the practical level (e.g., by spatial planners, politicians, decision-makers).
The authors believe that methods are required to explicitly elucidate the relevance
of classes produced by such automated data processing. In some instances, the
wide range of possible interpretations has been explored in rather unsystematic
fashion by simple inspection of the characteristics of variables (e.g., measures of
central tendency, variability). Approaches which use machine learning can provide
a more systematic review of the complete range of hypotheses. This can produce
decision rules or trees that can then be applied to discover useful and previously
unknown correlations in the data set. Such methods have been presented in this
chapter using the example of a small data set (111 urban districts in Germany)
with only seven dimensions (variables). Presentation of this exploratory approach
should show how the applied processes can help generate hypotheses as well as
extract important correlations from the data set. Methods of KDD can be used to
produce a much larger number of hypotheses than would be possible manually. Yet
this powerful approach has at the same time a particular drawback, which should
also be emphasized in regard to the results of this chapter: the hypotheses derived
from many data mining procedures must not be interpreted as statistically validated
truths. Rather they should be understood as suggestions for discussion, which should
be assessed by suitable methods in follow-up spatial investigations.
For example, it is possible to find data subsets that are perhaps irrelevant, which
have already been discovered (rediscovered, not new for the target/application
domain) or are highly uncertain. Some subsets require a more detailed investigation
(e.g., “ coastal urban districts with a large mesh size due to linear transport in-
frastructure ”, “Urban Protected Areas”, subclasses of the variable SealedSurface ).
It should be emphasized that the extraction of knowledge for spatial planning
generally requires further investigations, validations, and tests. In particular, it can
be helpful to apply other methods to confirm specific hypothesis extracted by the
Search WWH ::




Custom Search