Databases Reference
In-Depth Information
11.4.1 CategorizationofConstraints
This section studies how to categorize the constraints used in cluster analysis. Specifi-
cally, we can categorize constraints according to the subjects on which they are set, or
on how strongly the constraints are to be enforced.
As discussed in Chapter 10, cluster analysis involves three essential aspects: objects
as instances of clusters, clusters as groups of objects, and the similarity among objects.
Therefore, the first method we discuss categorizes constraints according to what they are
applied to. We thus have three types: constraints on instances , constraints on clusters , and
constraints on similarity measurement .
Constraints on instances: A constraint on instances specifies how a pair or a set of
instances should be grouped in the cluster analysis. Two common types of con-
straints from this category include:
Must-link constraints. If a must-link constraint is specified on two objects x and
y , then x and y should be grouped into one cluster in the output of the cluster
analysis. These must-link constraints are transitive. That is, if must-link
.
x , y
/
and
must-link
.
Cannot-link constraints. Cannot-link constraints are the opposite of must-link
constraints. If a cannot-link constraint is specified on two objects, x and y ,
then in the output of the cluster analysis, x and y should belong to different
clusters. Cannot-link constraints can be entailed. That is, if cannot-link
.
y , z
/
, then must-link
.
x , z
/
.
x , y
/
,
x , x 0
y , y 0
x 0 , y 0
must-link
.
/
, and must-link
.
/
, then cannot-link
.
/
.
A constraint on instances can be defined using specific instances. Alternatively, it
can also be defined using instance variables or attributes of instances. For example, a
constraint,
Constraint
.
x , y
/
: must-link
.
x , y
/
if dist
.
x , y
/
,
uses the distance between objects to specify a must-link constraint.
Constraints on clusters: A constraint on clusters specifies a requirement on the clusters,
possibly using attributes of the clusters. For example, a constraint may specify the
minimum number of objects in a cluster, the maximum diameter of a cluster, or the
shape of a cluster (e.g., a convex). The number of clusters specified for partitioning
clustering methods can be regarded as a constraint on clusters.
Constraints on similarity measurement: Often, a similarity measure, such as Eucli-
dean distance, is used to measure the similarity between objects in a cluster anal-
ysis. In some applications, exceptions apply. A constraint on similarity measurement
specifies a requirement that the similarity calculation must respect. For example, to
cluster people as moving objects in a plaza, while Euclidean distance is used to give
 
Search WWH ::




Custom Search