Databases Reference
In-Depth Information
of all objects under analysis and represents the
density-based clustering structure
of the
data. Objects in a denser cluster are listed closer to each other in the cluster ordering.
This ordering is equivalent to density-based clustering obtained from a wide range of
parameter settings. Thus, OPTICS does not require the user to provide a specific density
threshold. The cluster ordering can be used to extract basic clustering information (e.g.,
cluster centers, or arbitrary-shaped clusters), derive the intrinsic clustering structure, as
well as provide a visualization of the clustering.
To construct the different clusterings simultaneously, the objects are processed in a
specific order. This order selects an object that is density-reachable with respect to the
lowest
) will be finished first. Based
on this idea, OPTICS needs two important pieces of information per object:
value so that clusters with higher density (lower
0
such
The
core-distance
of
an
object
p
is
the
smallest
value
that
the
0
is the minimum dis-
tance threshold that makes
p
a core object. If
p
is not a core object with respect to
0
-neighborhood of
p
has at least
MinPts
objects. That is,
and
MinPts
, the core-distance of
p
is undefined.
The
reachability-distance
to object
p
from
q
is the minimum radius value that makes
p
density-reachable from
q
. According to the definition of density-reachability,
q
has to be a core object and
p
must be in the neighborhood of
q
. Therefore, the
reachability-distance from
q
to
p
is
max
f
core-distance
(
q
),
dist
(
p
,
q
/g. If
q
is not a
core object with respect to
and
MinPts
, the reachability-distance to
p
from
q
is
undefined.
An object
p
may be directly reachable from multiple core objects. Therefore,
p
may have multiple reachability-distances with respect to different core objects. The
smallest reachability-distance of
p
is of particular interest because it gives the shortest
path for which
p
is connected to a dense cluster.
Example10.8
Core-distance and reachability-distance.
Figure 10.16 illustrates the concepts of core-
distance and reachability-distance. Suppose that
D 6 mm and
MinPts
D 5. The core-
0
, between
p
and the fourth closest data object from
p
.
The reachability-distance of
q
1
from
p
is the core-distance of
p
(i.e.,
distance of
p
is the distance,
0
D 3 mm) because
this is greater than the Euclidean distance from
p
to
q
1
. The reachability-distance of
q
2
with respect to
p
is the Euclidean distance from
p
to
q
2
because this is greater than the
core-distance of
p
.
OPTICS computes an ordering of all objects in a given database and, for each object
in the database, stores the core-distance and a suitable reachability-distance. OPTICS
maintains a list called OrderSeeds to generate the output ordering. Objects in Order-
Seeds are sorted by the reachability-distance from their respective closest core objects,
that is, by the smallest reachability-distance of each object.
OPTICS begins with an arbitrary object from the input database as the current
object,
p
. It retrieves the
-neighborhood of
p
, determines the core-distance, and sets
the reachability-distance to
undefined
. The current object,
p
, is then written to output.