Database Reference
In-Depth Information
to apply a stricter version of the mining algorithm. An example is to use the
T-clustering algorithm (see Chapter 6 ), reducing the allowed distance between
trajectories or choosing different distance functions that become more precise
at each level, such as the starting points , route similarity ,andthen synchronized
route similarity . In M-Atlas, each step is realized as a sequence of two kinds of
queries: a mining query to perform the clustering step and a relation query for
the entails operation, that is, the selection of trajectories satisfying the cluster
definition. This is depicted as follows:
CREATE MODEL <model_table> MINE AS T-CLUSTERING
FROM (SELECT t.id, t.object
FROM <trajectories_table> t)
SET T-CLUSTERING.METHOD = <distance function> AND ...
CREATE RELATION <relation_table> USING ENTAILS
FROM (SELECT t.id, t.object, m.id, m.object
FROM <trajectories_table> t, <model_table> m
WHERE m.id<>'noise')
The first query performs a clustering task on all trajectories. The following
query uses both the resulting model table representing the clustering and the
original trajectories data set to find trajectories that belong to some clustering,
thus excluding the noise - here specified by the “noise” ID. In every step
the classification of the noise can be both unsupervised, for example, the T-
clustering, or supervised, where the user individually selects the interesting
patterns extracted in the last data mining execution.
Chapter 10 illustrates examples of use of this technique on the Milano data
set.
Tuning the Parameters
Tuning the parameters of the data mining algorithm is not easy, because it usually
requires several attempts to evaluate the results and adjust the parameters values
accordingly. In general, we must consider two aspects when dealing with the
parameters settings: the number of patterns and the usefulness of the patterns.
Usually the objective of the analyst is to find a small set of useful and meaningful
patterns. Finding a good value for the parameters that guarantees this result is
highly arduous. However, some techniques may be used to guess a reasonable
value. Essentially, the idea is to progressively adjust the parameter values based
on the characteristics of the resulting patterns. As an example, let us consider the
T-pattern algorithm presented in Chapter 6 , although similar methodology may
be used for other algorithms. Recall that the parameters are the support threshold,
the time tolerance, and an initial set of spatial regions and the algorithm finds
Search WWH ::




Custom Search