Cluster Analysis: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

Algorithm: DBSCAN: a density-based clustering algorithm.

Input:

D : a data set containing n objects,

: the radius parameter, and

MinPts : the neighborhood density threshold.

Output: A set of density-based clusters.

Method:

mark all objects as unvisited ;

(1)

(2)

do

(3)

randomly select an unvisited object p ;

(4)

mark p as visited ;

(5)

if the

-neighborhood of p has at least MinPts objects

(6)

create a new cluster C , and add p to C ;

(7)

let N be the set of objects in the

-neighborhood of p ;

(8) for each point p 0 in N

(9) if p 0 is unvisited

(10) mark p 0 as visited ;

(11) if the -neighborhood of p 0 has at least MinPts points,

add those points to N ;

(12) if p 0 is not yet a member of any cluster, add p 0 to C ;

(13) end for

(14) output C ;

(15) else mark p as noise ;

(16) until no object is unvisited ;

Figure10.15 DBSCAN algorithm.

are usually empirically set and difficult to determine, especially for real-world, high-

dimensional data sets. Most algorithms are sensitive to these parameter values: Slightly

different settings may lead to very different clusterings of the data. Moreover, real-world,

high-dimensional data sets often have very skewed distributions such that their intrin-

sic clustering structure may not be well characterized by a single set of global density

parameters.

Note that density-based clusters are monotonic with respect to the neighborhood

threshold. That is, in DBSCAN, for a fixed MinPts value and two neighborhood thresh-

olds,

1 < 2 , a cluster C with respect to

1 and MinPts must be a subset of a cluster

C 0 with respect to

2 and MinPts . This means that if two objects are in a density-based

cluster, they must also be in a cluster with a lower density requirement.

To overcome the difficulty in using one set of global parameters in clustering analy-

sis, a cluster analysis method called OPTICS was proposed. OPTICS does not explicitly

produce a data set clustering. Instead, it outputs a cluster ordering . This is a linear list

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home