Understanding Human Mobility Using Mobility Data Mining - Mobility Data

Database Reference

In-Depth Information

to apply a stricter version of the mining algorithm. An example is to use the

T-clustering algorithm (see Chapter 6 ), reducing the allowed distance between

trajectories or choosing different distance functions that become more precise

at each level, such as the starting points , route similarity ,andthen synchronized

route similarity . In M-Atlas, each step is realized as a sequence of two kinds of

queries: a mining query to perform the clustering step and a relation query for

the entails operation, that is, the selection of trajectories satisfying the cluster

definition. This is depicted as follows:

CREATE MODEL <model_table> MINE AS T-CLUSTERING

FROM (SELECT t.id, t.object

FROM <trajectories_table> t)

SET T-CLUSTERING.METHOD = <distance function> AND ...

CREATE RELATION <relation_table> USING ENTAILS

FROM (SELECT t.id, t.object, m.id, m.object

FROM <trajectories_table> t, <model_table> m

WHERE m.id<>'noise')

The first query performs a clustering task on all trajectories. The following

query uses both the resulting model table representing the clustering and the

original trajectories data set to find trajectories that belong to some clustering,

thus excluding the noise - here specified by the “noise” ID. In every step

the classification of the noise can be both unsupervised, for example, the T-

clustering, or supervised, where the user individually selects the interesting

patterns extracted in the last data mining execution.

Chapter 10 illustrates examples of use of this technique on the Milano data

set.

Tuning the Parameters

Tuning the parameters of the data mining algorithm is not easy, because it usually

requires several attempts to evaluate the results and adjust the parameters values

accordingly. In general, we must consider two aspects when dealing with the

parameters settings: the number of patterns and the usefulness of the patterns.

Usually the objective of the analyst is to find a small set of useful and meaningful

patterns. Finding a good value for the parameters that guarantees this result is

highly arduous. However, some techniques may be used to guess a reasonable

value. Essentially, the idea is to progressively adjust the parameter values based

on the characteristics of the resulting patterns. As an example, let us consider the

T-pattern algorithm presented in Chapter 6 , although similar methodology may

be used for other algorithms. Recall that the parameters are the support threshold,

the time tolerance, and an initial set of spatial regions and the algorithm finds

Search WWH ::

Custom Search

Home