Introduction - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Consequently, the sensor readings are inherently uncertain and probabilistic. In this

example, we consider three different application scenarios in traffic monitoring.

Scenario 1: Finding the top-k speeding records at a certain time.

Table 1.1 lists a set of synthesized records of vehicle speeds recorded by sensors.

Each sensor reports the location, time, and speed of vehicles passing the sensor. In

some locations where the traffic is heavy, multiple sensors are deployed to improve

the detection quality. Two sensors in the same location (e.g., S 206 and S 231 , as well

as S 063 and S 732 in Table 1.1) may detect the vehicle speed at the same time, such

as records R 2 and R 3 , as well as R 5 and R 6 . In such a case, if the speeds reported

by multiple sensors are inconsistent, at most one sensor can be correct.

The uncertain data in Table 1.1(a) carries the possible worlds semantics [23,

12, 24, 7] as follows. The data can be viewed as the summary of a set of possible

worlds, where a possible world contains a set of tuples governed by some underlying

generation rules which constrain the presence of tuples. In Table 1.1, the fact that

R 2 and R 3 cannot be true at the same time can be captured by a generation rule

R 2

⊕

R 3 . Another generation rule is R 5

⊕

R 6 . Table 1.1(b) shows all possible worlds

and their existence probability values.

Ranking queries can be used to analyze uncertain traffic records. For example,

it is interesting to find out the top- 2 speeding records so that actions can be taken

to improve the situation. However, in different possible worlds the answers to this

question may be different. What a ranking query means on uncertain data in such

an application scenario and how to answer a ranking query efficiently are studied

in Chapter 5 in this topic.

Scenario 2: Monitoring top-k speeding spots in real time.

Table 1.1 contains a set of uncertain records at a certain time. In some appli-

cations, a speed sensor will keep sending traffic records to a central server contin-

uously. Therefore, the speeds recorded by each sensor can be modeled as a data

stream.

For example, the ARTIMIS center in Cincinnati, Ohio/Kentucky reports the

speed, volume and occupancy of road segments every 30 seconds [25]. Table 1.2

is a piece of sample data from ARTIMIS Data Archives 1 .

Consider a simple continuous query - continuously reporting a list of top-2

monitoring points in the road network of the fastest vehicle speeds in the last

5 minutes . One interesting and subtle issue is how we should measure the vehi-

cle speed at a monitoring point. Can we use some simple statistics like the aver-

age/median/maximum/minimum speed in the last 5 minutes? Each of such simple

statistics may not capture the distribution of the data well. Therefore, new ranking

criteria for such uncertain data streams are highly desirable. Moreover, it is im-

portant to develop efficient query monitoring algorithms that suit the application

need.

In Chapter 6, we will introduce an uncertain data stream model and a continuous

probabilistic threshold top-k query to address this application scenario. Efficient

stream specific query evaluation methods will be discussed.

1 http://www.its.dot.gov/JPODOCS/REPTS_TE/13767.html

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home