Knowledge Discovery in Spatial Planning Data: A Concept for Cluster Understanding - Computational Approaches for Urban Environments - page 57

Geoscience Reference

In-Depth Information

3.3.4

Definition of a Distance Measure for

the High-Dimensional Data

In order to project the high-dimensional data onto a space which can be visualized or

to identify clusters in the data, a meaningful (dis-) similarity measure (data distance)

must be defined. The distance measure must be such that similar data are close and

differing data distant. Each variable should be well captured by this measure. Two

variables with a high correlation represent basically the same information. Thus, if

both variables are included in a data distance, the same information is weighted by a

factor of 2. One simple approach to address this effect is to remove highly correlated

data from the definition of a meaningful data distance.

For the UD data, it makes sense to include only the four variables

OpenSpaceMeshSize, BuildingArea, SealedSurface, and ProtectedAreas as the

other variables are highly correlated to this subset, so that their information is

already contained in the selected variables. Comparison is of the transformed

variables. Otherwise, the differences between two data points within a variable

would not be comparable (cf. Fig. 3.9 ). In order to adjust the scaling for the data, all

data was rescaled to percent.

0.022

0.06

0.02

0.018

0.05

0.016

0.014

0.04

0.012

0.03

0.01

0.008

0.02

0.006

0.01

0.004

0.002

0

20

40

60

80

20

40

60

80

Selected UD data

Transformed UD data

Fig. 3.9 Comparison of the distributions of original ( left ) and transformed variables ( right ).

Blue = OpenSpaceMeshSize, red = BuildingArea, green = SealedSurfaces, black = ProtectedAreas

Next Page

Computational Approaches for Urban Environments

Search WWH ::

Custom Search

Home