Database Reference
In-Depth Information
problems: dividing trajectories into homogeneous groups; learning rules to label
any arbitrary trajectory with some tag, to be chosen among a set of predefined
classes; and predicting where an arbitrary trajectory will move next. In the
following we will introduce and discuss each of them.
6.3.1 Trajectory Clustering
In data mining, clustering is defined as the task of creating groups of objects that
are similar to each other, while keeping separated those that are much different.
In most cases, the final result of clustering is a partitioning of the input objects
into groups, called clusters , which means that all objects are assigned to one
cluster, and clusters are mutually disjoint. However, exceptions to this general
definition exist and are relatively common.
While the data mining literature is extremely rich with clustering methods for
simple data types, such as numerical vectors or tuples of a relational database,
moving to the realm of trajectory makes it difficult to directly apply them. The
problem is that trajectories are complex objects, and many traditional clustering
methods are tightly bound to the simple and standard data type they were
developed for. In most cases, to use them we need to adapt the existing methods
or even to reimplement their basic ideas in a completely new, trajectory-oriented
way.We will see next some solutions that try to reuse as much as possible existing
methods and frameworks; then, we will discuss a few clustering methods that
were tailored around trajectory data in the first place.
Generic Methods with Trajectory Distances
Several clustering methods in the data mining literature are actually clustering
schemata that can be applied to any data type, provided that a notion of similarity
or distance between objects is given. For this reason, they are commonly referred
to as distance-based methods. The key point is that such methods do not look
at the inner structure of data, and simply try to create groups that exhibit small
distances between their members. All the knowledge about the structure of
the data and their semantics is encapsulated in the distance function provided,
which summarizes this knowledge through single numerical values, the distances
between pairs of objects; the algorithm itself, then, combines such summaries
to form groups by following some specific strategy.
To give an idea of the range of alternative clustering schemata available in lit-
erature, we mention three very common ones: k-means , hierarchical clustering ,
and density-based clustering .
k-means (Figure 6.5 a) tries to partition all input objects into k clusters, where
k is a parameter given by the user. The method starts from a random partitioning
and then performs several iterations to progressively refine it. During an iteration,
k -means first computes a centroid for each cluster, that is, a representative object
Search WWH ::




Custom Search