Mobility Data Mining - Mobility Data

Database Reference

In-Depth Information

problems: dividing trajectories into homogeneous groups; learning rules to label

any arbitrary trajectory with some tag, to be chosen among a set of predefined

classes; and predicting where an arbitrary trajectory will move next. In the

following we will introduce and discuss each of them.

6.3.1 Trajectory Clustering

In data mining, clustering is defined as the task of creating groups of objects that

are similar to each other, while keeping separated those that are much different.

In most cases, the final result of clustering is a partitioning of the input objects

into groups, called clusters , which means that all objects are assigned to one

cluster, and clusters are mutually disjoint. However, exceptions to this general

definition exist and are relatively common.

While the data mining literature is extremely rich with clustering methods for

simple data types, such as numerical vectors or tuples of a relational database,

moving to the realm of trajectory makes it difficult to directly apply them. The

problem is that trajectories are complex objects, and many traditional clustering

methods are tightly bound to the simple and standard data type they were

developed for. In most cases, to use them we need to adapt the existing methods

or even to reimplement their basic ideas in a completely new, trajectory-oriented

way.We will see next some solutions that try to reuse as much as possible existing

methods and frameworks; then, we will discuss a few clustering methods that

were tailored around trajectory data in the first place.

Generic Methods with Trajectory Distances

Several clustering methods in the data mining literature are actually clustering

schemata that can be applied to any data type, provided that a notion of similarity

or distance between objects is given. For this reason, they are commonly referred

to as distance-based methods. The key point is that such methods do not look

at the inner structure of data, and simply try to create groups that exhibit small

distances between their members. All the knowledge about the structure of

the data and their semantics is encapsulated in the distance function provided,

which summarizes this knowledge through single numerical values, the distances

between pairs of objects; the algorithm itself, then, combines such summaries

to form groups by following some specific strategy.

To give an idea of the range of alternative clustering schemata available in lit-

erature, we mention three very common ones: k-means , hierarchical clustering ,

and density-based clustering .

k-means (Figure 6.5 a) tries to partition all input objects into k clusters, where

k is a parameter given by the user. The method starts from a random partitioning

and then performs several iterations to progressively refine it. During an iteration,

k -means first computes a centroid for each cluster, that is, a representative object

Search WWH ::

Custom Search

Home