Civil Engineering Reference
In-Depth Information
for which such centroid is the nearest according to a pre-specifi ed measure
of distance/similarity. Thus, the network representation is given by the union
of the Voronoi regions, each including a centroid and its associated elements
(Filippone et al. , 2008).
Conceptually, most approaches to clustering are based on developing a
similarity measure m ij between pairs of vertices ( v i , v j ) and involve an itera-
tive process of vertex grouping up to a point where a maximum similarity
value is achieved. In infrastructure systems, similarity is commonly defi ned
as a measure of distance between two vertices; two cases in point are the
Euclidean distance or the Pearson correlation between columns (or rows)
of the adjacency matrix (Wasserman and Faust, 1994). The selection of
a similarity function defi nes the clustering process. Therefore, the same
network may produce different clusters depending upon the similarity
function used.
Because of the complexity of the clustering problem, exact optimization
is not practical. Commonly, evolutionary computation and other heuristics
are used. The evaluation of clustering performance is a diffi cult task since
an expected solution or correct partitioning is usually unavailable. Conse-
quently, most often the quality of clustering is measured based on the
structure of the partitioning and relational characteristics of the network.
Many approaches seek to measure the balance of density within clusters
versus sparsity between clusters.
There is no standard approach for network clustering; it depends on the
context, type of network, objective, and computational cost. Clustering
methods can be classifi ed into partitional (Newman, 2004; Newman and
Leicht, 2007) and hierarchical methods, which in turn can be divided into
top-down and bottom-up (Peng and Xia, 2005), depending on whether they
construct hierarchies by dividing the whole graph into partitions or by
merging individual elements until the graph is complete.
Although clustering is unsupervised in nature, some methods may fre-
quently require a priori information; thus, there is a distinction between
supervised and unsupervised algorithms. Some examples of the former are
the NJW-Method (Ng et al. , 2001) and the k -means algorithm (McQueen,
1967), which require information about the number of clusters in advance.
On the other hand, a well-known unsupervised method is the Markov
clustering algorithm, MCL (van Dongen, 2000). Furthermore, more ela-
borate clustering methods include kernel-based methods (Graepel et al. ,
1998; Scholkopf et al. , 1998; Inokuchi and Miyamoto, 2004) and spectral
methods (Meila and Shi, 2001; Ng et al. , 2001; Verma and Meila, 2003).
The basic idea behind kernel-based methods is to implicitly map data
to high-dimensional spaces where the classifi cation can be done more
easily, whereas spectral methods rely on eigenvalue analysis to perform
partitionings.
Search WWH ::




Custom Search