Understanding Human Mobility Using Mobility Data Mining - Mobility Data

Database Reference

In-Depth Information

the most frequent sequences of regions visited by the users with their traveling

time. The method we propose adjusts the parameters based on the analysis of

the mining results. The objective is to iterate the mining task with different

parameter values toward the objective considering the characteristics of the

resulting patterns. Therefore, depending on the resulting set of patterns, an

action must be taken as summarized here.

The result set is as follows:

Small and contains useful patterns : In this case, the objective of the analyst

is reached.

Too big or the algorithm is not terminating : In this case, the support threshold

is probably is too low and too many regions become frequent, leading to an

explosion of patterns. There are three possible solutions: (1) to increment the

support threshold, (2) check the set of regions to reduce them, or (3) increase

the time tolerance so more patterns will be merged together.

Small, but time intervals are trivial : The time tolerance is too high and makes

the pattern too inclusive, leading to trivial ones. We need to lower the time

tolerance.

Small, but the sequences of regions are trivial : In this case, the support

threshold is too high and the real patterns are hidden in the data or the set of

regions is not meaningful. Some regions could be too large and therefore they

can be split into a finer granularity, thus leading to a better differentiation in

the resulting patterns.

When a reasonable result is obtained, the analyst can apply a pruning in the

postprocessing phase to remove some of the patterns, considering additional

properties such as the number of regions in a T-pattern. The parameter setting

in any data-mining algorithm is recognized in the literature as an open issue

and the optimal solution is far from being trivial. However, having a method-

ology to drive the parameter setting is a first step in searching for a good

solution. Naturally, it could be that in some cases an algorithm is oversensitive

to parameter changes, thus making it extremely difficult to find a good parameter

setting.

The problem of finding a good initial parameter configuration is also worth

a discussion: the analyst can simply start from a reasonable or random set of

thresholds and then start tuning the parameters as described earlier. Another,

smarter possibility is a parameter estimation performed considering the critical

steps of the algorithm. Consider again the basic step of the T-pattern algorithm:

the detection of frequent regions in the area under analysis makes the support

threshold the most influent parameter for the whole process. We present a

heuristics data-driven method to estimate the value for this threshold. This is

based on the cumulative frequency distribution of trajectories in the spatial grid

cells. An example on the Milano data set is shown in Figure 7.2 a. The points

Search WWH ::

Custom Search

Home