Databases Reference
In-Depth Information
the walking distance between two points, a constraint on similarity measurement is
that the trajectory implementing the shortest distance cannot cross a wall.
There can be more than one way to express a constraint, depending on the category.
For example, we can specify a constraint on clusters as
Constraint 1 : the diameter of a cluster cannot be larger than d .
The requirement can also be expressed using a constraint on instances as
Constraint 0 1 : cannot-link
.
x , y
/
if dist
.
x , y
/>
d .
(11.41)
Example11.22 Constraints on instances, clusters, and similarity measurement. AllElectronics clusters
its customers so that each group of customers can be assigned to a customer relationship
manager. Suppose we want to specify that all customers at the same address are to be
placed in the same group, which would allow more comprehensive service to families.
This can be expressed using a must-link constraint on instances:
Constraint family .
x , y
/
: must-link
.
x , y
/
if x . address D y . address .
AllElectronics has eight customer relationship managers. To ensure that they each
have a similar workload, we place a constraint on clusters such that there should be
eight clusters, and each cluster should have at least 10 % of the customers and no more
than 15 % of the customers. We can calculate the spatial distance between two customers
using the driving distance between the two. However, if two customers live in different
countries, we have to use the flight distance instead. This is a constraint on similarity
measurement.
Another way to categorize clustering constraints considers how firmly the constraints
have to be respected. A constraint is hard if a clustering that violates the constraint
is unacceptable. A constraint is soft if a clustering that violates the constraint is not
preferable but acceptable when no better solution can be found. Soft constraints are also
called preferences .
Example11.23 Hard and soft constraints. For AllElectronics , Constraint family in Example 11.22 is a hard
constraint because splitting a family into different clusters could prevent the company
from providing comprehensive services to the family, leading to poor customer satisfac-
tion. The constraint on the number of clusters (which corresponds to the number of
customer relationship managers in the company) is also hard. Example 11.22 also has
a constraint to balance the size of clusters. While satisfying this constraint is strongly
preferred, the company is flexible in that it is willing to assign a senior and more capa-
ble customer relationship manager to oversee a larger cluster. Therefore, the constraint
is soft.
Ideally, for a specific data set and a set of constraints, all clusterings satisfy the con-
straints. However, it is possible that there may be no clustering of the data set that
 
Search WWH ::




Custom Search