Databases Reference
In-Depth Information
A cut is sparsest if its sparsity is not greater than the sparsity of any other cut. There may
be more than one sparsest cut.
In Example 11.21 and Figure 11.14, C 2 is a sparsest cut. Using sparsity as the objective
function, a sparsest cut tries to minimize the number of edges crossing the partitions and
balance the partitions in size.
Consider a clustering on a graph G D.
that partitions the graph into k clusters.
The modularity of a clustering assesses the quality of the clustering and is defined as
V , E
/
l i
j E j
2 !
d i
2j E j
k X
Q D
,
(11.39)
i D1
where l i is the number of edges between vertices in the i th cluster, and d i is the sum of
the degrees of the vertices in the i th cluster. The modularity of a clustering of a graph is
the difference between the fraction of all edges that fall into individual clusters and the
fraction that would do so if the graph vertices were randomly connected. The optimal
clustering of graphs maximizes the modularity.
Theoretically, many graph clustering problems can be regarded as finding good cuts,
such as the sparsest cuts, on the graph. In practice, however, a number of challenges
exist:
High computational cost: Many graph cut problems are computationally expen-
sive. The sparsest cut problem, for example, is NP-hard. Therefore, finding the
optimal solutions on large graphs is often impossible. A good trade-off between
efficiency/scalability and quality has to be achieved.
Sophisticated graphs: Graphs can be more sophisticated than the ones described
here, involving weights and/or cycles.
High dimensionality: A graph can have many vertices. In a similarity matrix, a vertex
is represented as a vector (a row in the matrix) with a dimensionality that is the
number of vertices in the graph. Therefore, graph clustering methods must handle
high dimensionality.
Sparsity: A large graph is often sparse, meaning each vertex on average connects to
only a small number of other vertices. A similarity matrix from a large sparse graph
can also be sparse.
There are two kinds of methods for clustering graph data, which address these
challenges. One uses clustering methods for high-dimensional data, while the other is
designed specifically for clustering graphs.
The first group of methods is based on generic clustering methods for high-
dimensional data. They extract a similarity matrix from a graph using a similarity
measure such as those discussed in Section 11.3.2. A generic clustering method can
then be applied on the similarity matrix to discover clusters. Clustering methods for
 
Search WWH ::




Custom Search