Database Reference
In-Depth Information
3.3.2.1 How Is Our Study Related?
The top- k simple typicality query and the discrete k -median problem both want to
find the instances in a set of instances optimizing the scores with respect to their
relationship to other instances. However, as will be clear in Chapter 4, the functions
to optimize are different. The methods of the discrete k -median problem cannot be
applied directly to answer top- k typicality queries.
Moreover, in discrete k -median problem, there is no ranking among the k median
objects. The top- k representative typicality queries as defined will return k objects
in an order.
3.3.3 Clustering Analysis
Clustering analysis partitions a set of data objects into smaller sets of similar ob-
jects. [81] is a nice survey of various clustering methods.
The clustering methods can be divided into the following categories. The par-
titioning methods partition the objects into k clusters and optimize some selected
partitioning criterion, where k is a user specified parameter. K-means [82], K-
medoids [83] and CLARANS [84] are examples of this category. The hierarchical
methods perform a series of partitions and group data objects into a tree of clusters.
BIRCH [85], CURE [86] and Chameleon [87] are examples of hierarchical methods.
The density-based methods use a local cluster criterion and find the regions in the
data space that are dense and separated from other data objects by regions with lower
density as clusters. The examples of density-based methods include DBSCAN [88],
OPTICS [89] and DENCLUE [90]. The grid-based methods use multi-resolution
grid data structures and form clusters by finding dense grid cells. STING [91] and
CLIQUE [92] are examples of grid-based methods.
3.3.3.1 How Is Our Study Related?
Typicality analysis and clustering analysis both consider similarity among objects.
However, the two problems have different objectives. Clustering analysis focuses
on partitioning data objects, while typicality analysis aims to find representative
instances.
In some studies, cluster centroids are used to represent the whole clusters. How-
ever, in general the centroid of a cluster may not be a representative point. For ex-
ample, medians are often considered as cluster centroids in partitioning clustering
methods, but they are not the most typical objects as shown in Chapter 4.
In the density-based clustering method DBSCAN [88], the concept of “core
point” is used to represent the point with high density. For a core point o , there
are at least MinPts points lying within a radius Eps from o , where MinPts and Eps
Search WWH ::




Custom Search