Database Reference
In-Depth Information
donated for research purposes by Octo Telematics Italia S.p.A , 1 the leader for
this sector in Europe. We use two GPS data sets: the first, Milano2007 , describes
approximately 17,000 cars tracked during one week (from April 1 through April
7, 2007) of ordinary mobile activity in the urban area of the city of Milan (a
20 km × 20 km square). The second, Pisa2010 , contains approximately 40,000
cars tracked during 5 weeks (from June 14 through July 18, 2011) in coastal
Tuscany, a 100 km × 100 km square centered on the city of Pisa. The average
sampling rate of the GPS receivers is 30 seconds. Globally, Milano2007 con-
sists of approximately 2 million observations and Pisa2010 of approximately 20
million observations, each consisting of a quadruple ( id, lat, long, t ), where id
is the car identifier, ( lat, long ) are the spatial coordinates, and t the time of the
observation. The car identifiers are pseudonymized, in order to achieve a basic
level of anonymity
The resolution of the spatial coordinates is at 10 6 degrees, and the error
of the positioning system is estimated at 10-20 m in normal conditions. The
temporal resolution is in seconds. All the observations of the same car id over
the entire observation period are chained together in increasing temporal order
into a global trajectory of car id . Using the trajectory reconstruction techniques
presented in Chapter 2 , we obtained approximately 200,000 different travels in
Milano2007, and approximately 1,500,000 different travels in Pisa2010.
10.3 Data Understanding
Since the data we can use for analysis are a sample of the real population, as a
first step we need to evaluate their representativeness and statistical significance.
We do that through a set of statistical evaluations that analyze the distributions
of typical movement dynamics properties, such as speed, length of each trip, and
temporal location. In some cases these same measurements are estimated also by
traditional transportation methods, therefore a comparison is possible in order to
assess meaningfulness of the data sample as proxy of real mobility phenomena.
For the Milano2007 data set, we compared it against the survey data collected
in 2005-6 by the local mobility agency AMA, 2 although the two data sources
differ in both the sampled population and the kind of collected information:
mobility reports are obtained through a survey campaign and include flows of
private vehicles but also public transportation and pedestrians.
Since the basic components of mobility data are the spatial and temporal
dimensions, we focus on the statistical analysis of these dimensions separately.
First, we try to understand when people are moving during the day. In particular,
1 http://www.octotelematics.it
2 http://www.ama-mi.it/english
 
 
Search WWH ::




Custom Search