Introduction - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

# Time

Samp Speed Volume Occupancy

00:01:51

30

47

575

6

00:16:51

30

48

503

5

00:31:51

30

48

503

5

00:46:51

30

49

421

4

01:01:52

30

48

274

5

01:16:52

30

42

275

14

...

Table 1.2 Data for segment SEGK 715001 for 07

/

15

/

2001 in ARTIMIS Data Archives (Number

of Lanes: 4).

Example 1.1 demonstrates the great need for ranking queries in uncertain data

analysis. In traditional data analysis for deterministic data, ranking queries play an

important role by selecting the subset of records of interest according to user spec-

ified criteria. With the rapidly increasing amount of uncertain data, ranking queries

have become even more important, since the uncertainty in data not only increases

the scale of data but also introduces more difficulties in understanding and analyzing

the data.

1.2 Challenges

While being useful in many important applications, ranking queries on uncertain

data pose grand challenges to query semantics and processing.

Challenge 1 What are the uncertain data models that we need to adopt?

Example 1.1 illustrates three different application scenarios in ranking the infor-

mation obtained from traffic sensors. This not only shows the great use of ranking

queries on uncertain data, but also raises a fundamental question: how can we de-

velop uncertain data models that capture the characteristics of data and suit appli-

cation needs ?

In particular, we need to consider the following three aspects. First, is the un-

certain data static or dynamic? Second, how to describe the dependencies among

uncertain data objects? Third, how can we handle complex uncertain data like a

graph?

Challenge 2 How to formulate probabilistic ranking queries?

As shown in Example 1.1, different ranking queries on uncertain data can be asked

according to different application needs. In Scenario 1, we want to select the records

ranked in top- k with high confidence, while in Scenario 2, the objective is to find

the sensors whose records are ranked in top- k with probabilities no smaller than a

threshold in a time window. Last, in Scenario 3, we are interested in finding paths

such that the sums of the (uncertain) travel time along the path are ranked at the top.

Therefore, it is important to develop meaningful ranking queries according to

different application interests. Moreover, the probability associated with each data

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home