Database Reference
In-Depth Information
Consequently, the sensor readings are inherently uncertain and probabilistic. In this
example, we consider three different application scenarios in traffic monitoring.
Scenario 1: Finding the top-k speeding records at a certain time.
Table 1.1 lists a set of synthesized records of vehicle speeds recorded by sensors.
Each sensor reports the location, time, and speed of vehicles passing the sensor. In
some locations where the traffic is heavy, multiple sensors are deployed to improve
the detection quality. Two sensors in the same location (e.g., S 206 and S 231 , as well
as S 063 and S 732 in Table 1.1) may detect the vehicle speed at the same time, such
as records R 2 and R 3 , as well as R 5 and R 6 . In such a case, if the speeds reported
by multiple sensors are inconsistent, at most one sensor can be correct.
The uncertain data in Table 1.1(a) carries the possible worlds semantics [23,
12, 24, 7] as follows. The data can be viewed as the summary of a set of possible
worlds, where a possible world contains a set of tuples governed by some underlying
generation rules which constrain the presence of tuples. In Table 1.1, the fact that
R 2 and R 3 cannot be true at the same time can be captured by a generation rule
R 2
R 3 . Another generation rule is R 5
R 6 . Table 1.1(b) shows all possible worlds
and their existence probability values.
Ranking queries can be used to analyze uncertain traffic records. For example,
it is interesting to find out the top- 2 speeding records so that actions can be taken
to improve the situation. However, in different possible worlds the answers to this
question may be different. What a ranking query means on uncertain data in such
an application scenario and how to answer a ranking query efficiently are studied
in Chapter 5 in this topic.
Scenario 2: Monitoring top-k speeding spots in real time.
Table 1.1 contains a set of uncertain records at a certain time. In some appli-
cations, a speed sensor will keep sending traffic records to a central server contin-
uously. Therefore, the speeds recorded by each sensor can be modeled as a data
stream.
For example, the ARTIMIS center in Cincinnati, Ohio/Kentucky reports the
speed, volume and occupancy of road segments every 30 seconds [25]. Table 1.2
is a piece of sample data from ARTIMIS Data Archives 1 .
Consider a simple continuous query - continuously reporting a list of top-2
monitoring points in the road network of the fastest vehicle speeds in the last
5 minutes . One interesting and subtle issue is how we should measure the vehi-
cle speed at a monitoring point. Can we use some simple statistics like the aver-
age/median/maximum/minimum speed in the last 5 minutes? Each of such simple
statistics may not capture the distribution of the data well. Therefore, new ranking
criteria for such uncertain data streams are highly desirable. Moreover, it is im-
portant to develop efficient query monitoring algorithms that suit the application
need.
In Chapter 6, we will introduce an uncertain data stream model and a continuous
probabilistic threshold top-k query to address this application scenario. Efficient
stream specific query evaluation methods will be discussed.
1 http://www.its.dot.gov/JPODOCS/REPTS_TE/13767.html
Search WWH ::




Custom Search