Geoscience Reference
In-Depth Information
In the next section, we evaluate this problem in detail and illustrate how different normalisation
procedures, clustering algorithms and their implementation on parallel architectures of graphics
cards can enhance the speed with which classifications can be compiled. This is essential if
geodemographics are to be reoriented from static systems into problem-orientated flexible pattern
finding tools.
3.8 TOWARDS REAL-TIME GEOCOMPUTATION OF
GEODEMOGRAPHICS USING GPU
As noted earlier, the k -means clustering algorithm remains a core algorithm for the computation
of geodemographic classifications and is typically used to create the finest-level geodemographic
classes. The algorithm seeks to find the set of cluster centroids that minimises the following:
n
n
11
2
V
=
(
z
µ
)
(3.1)
x
y
x
==
y
where
n is the number of clusters
μ y is the mean centroid of all the points z x in cluster y
The k -means algorithm randomly assigns a set of n seeds within the data set and then proceeds
by assigning each datapoint to its nearest seed. Cluster centroids are then created for each cluster,
and the datapoints are assigned to the nearest centroid. The algorithm then recalculates the cluster
centroids and repeats these steps until a convergence criterion is met (usually when switching of
datapoints no longer takes place between the clusters).
However, there is evidence that the standard implementation of k -means requires multiple runs
of individual instances of the algorithm to create a robust classification (Singleton and Longley
2009), adding considerably to the computational burden. Some improvements to the k -means clus-
tering algorithm have been implemented by Reynolds et al. (2006) as k-means ++ , in which initial
seeds are assigned more intelligently by searching for density patterns within the attribute space,
thus reducing the time required to find an optimal solution. Other possibilities include the use of
algorithms that implement hierarchical clustering, partitioning around medoids (PAM) and genetic
clustering algorithms (Adnan 2011).
Adnan (2011) describes one implementation of a parallel k -means algorithm using CUDA that
works as follows:
1. The CPU prepares the datapoints and counts the number of GPU available on the NVIDIA
graphics card. Afterwards, the CPU uploads the datapoints and code assigning one k -means
run to each GPU.
2. Each GPU performs k -means clustering on the datapoints by minimising a cluster solution
based upon an initial set of seed sites. When an optimal solution is achieved, each GPU
returns the result to the CPU and claims the next available k -means run from the CPU if
there are any.
3. The CPU stores the results returned by each GPU in a local data structure contained in
random access memory (RAM). The CPU continues to delegate requests to GPU until the
remaining number of runs is equal to the total number of GPU.
4. After completion of the final set of iterations, the CPU compares the within sum of squares
distance optimisation criterion of all the runs.
5. The optimal solution is the one that minimises the within sum of squares distance within
each cluster.
Search WWH ::




Custom Search