Understanding Human Mobility Using Mobility Data Mining - Mobility Data

Database Reference

In-Depth Information

The use of a data mining algorithm in a knowledge discovery process is not

a straightforward process: usually the choice of the best algorithm and the best

parameters setting to extract meaningful and useful patterns is difficult even for

an expert analyst user.

In this section we introduce a set of techniques, demonstrated with examples

usingM-Atlas, to drive a user through the mobility knowledge discovery process

by optimizing the data analysis and tuning the parameters setting. The techniques

introduced here have been tailored to the case of mobility data, although they

can be applied to general data mining.

7.2.1 Data Preprocessing

In this section we present some data preprocessing techniques useful in mobility

knowledge discovery, illustrating them through the use of M-Atlas.

Data Validation

Data validation is a necessary step to measure how much the trajectory data

set we are going to analyze is consistent and representative of the real world

phenomena. Here we consider the data already cleaned and reconstructed as

described in Chapter 2 . However, the reconstruction step does not eliminate all

the possible imperfections in the data and errors at higher level may still exist.

This is due to bias in the data (e.g., tracking only a certain category of the

users) or technological problems (i.e., an area where the devices don't work)

that can produce unusual and unwanted effects on the analysis results. To asses

the significance of a data set as a proxy of the real mobility phenomena within

a certain area, the trajectory data set (as a set of spatio-temporal points) can be

compared against a “ground truth” such as survey data composed by a set of

interviews about mobility habits, for example done by phone (or other forms

of a priori knowledge). However, an important issue to be considered in this

comparison is the population of these two data sets. For example, considering the

data set coming from a set of private cars, this covers only vehicular movements,

whereas surveys usually include all kinds of movement, including pedestrians

and public transportation. Second, the automatic collection procedure and the

cleaning step applied for the car data set ensures that all movements are correctly

captured, whereas surveys leave space for omissions or distortions. Finally, the

data provide no explicit semantic information about the purpose of movements,

such as the final destination and profiles of the citizens involved, whereas surveys

explicitly collect this information. A significant difference holds also for the size

of the sample, which can alter the reality represented in the data set. A method

that can help to understand if the data are consistent with the ground truth is

to replicate a statistic analysis for each data set and make a comparison. This

Search WWH ::

Custom Search

Home