Geoscience Reference
In-Depth Information
clusters. Starting from the signatures ǚ S i of the pixels, the signature of a cluster C
is defined as the average signature of the pixels within that cluster:
D ǝ S i Ǜ i 2C
S C
(15.3)
Combining all activity types, the K-means algorithm aims at minimizing the
quantity E K measuring the total distance between the locations' signatures and their
cluster's signature:
X
K
X
E K
D
dist.i; C k /;
(15.4)
kD1
i 2C k
where the distance dist.i; C / between a pixel i and a cluster C is defined as
dist.i; C / D X
X
S i .t / S C .t / 2 ;
(15.5)
t
and K is a pre-imposed number of clusters. This simple quantity does not take
into account the temporal structure of the signatures (the order of the different time
intervals does not matter), but in the following, it will prove to deliver consistent
results. Each pixel is characterized here by a 3,360-dimensional feature vector (5
signatures of different type, each being valued on 672 time intervals).
A notable drawback of the K-means algorithm is the difficulty to determine the
“best” number K of clusters, whose value can depend on the shape and scale of the
distribution of points in a dataset and the desired clustering resolution. Different ad
hoc techniques to make that decision exist, most of them based on finding the value
of K best balancing the search for minimizing the intra-cluster distance E K and
maximizing the intercluster distances. There is however no consensus on the best
method to use, and the correct choice of K may also often rely on the researchers'
expert opinion and search for interpretable results. We guided our choice by looking
at local maxima of the silhouette index (Rousseeuw 1987 ). All cities presented local
maxima for K D 2 clusters (corresponding roughly to city centers and city suburbs),
K D 6, and larger values of K which vary with the studied city. All results presented
in the following have been obtained for K D 6, which is the most relevant case.
We indeed found that allowing a larger number of clusters mostly added clusters
concentrated of very few pixels in areas with very low mobile phone activity and
without any regular signatures.
15.5.2
Revealing the Spatial Structures of Cities
We conducted an independent K-means clustering analysis for each city. As we
previously stated, the best cluster size distribution and interpretability was achieved
for K
D
6 in each case. Figures 15.6 a- 15.8 a show the spatial projections of the
Search WWH ::




Custom Search