Top-k Typicality Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

Figure 4.10(d) shows the scalability of the four algorithms with respect to

database size. RT has a linear scalability. LT3 clearly has the better performance

and scalability than DLTA on large data sets.

The trends for discriminative typicality queries and representative typicality

queries are similar, as shown in Figure 4.11 and Figure 4.12, respectively.

In summary, as RT has linear complexity, when runtime is the only concern,

RT should be used. While DLTA and LT3 are much more scalable than the exact

algorithm and are more accurate than RT, they are good when both accuracy and

efficiency matter. LT3 has the better efficiency and scalability than DLTA, while

achieving comparable accuracy to DLTA.

4.6 Summary

In this chapter, we discussed how to compute the answers to top- k typicality queries

defined in Section 2.2.1.

•

We applied kernel density estimation to compute the likelihood of instances in an

uncertain object and presented an exact query answering algorithm. We showed

that the complexity of answering top- k typicality queries is quadratic.

•

We developed a linear-time randomized algorithm which adopts a tournament

mechanism and computes the approximation answers to top- k typicality queries.

•

The randomized algorithm does not provide a quality guarantee in the approxi-

mated answers, therefore, we further explored the locality nature of kernel den-

sity estimation and proposed two approximation algorithms that can provide

good quality guarantees.

By a systematic empirical evaluation using both real data sets and synthetic data

sets, we illustrated the effectiveness of top- k typicality queries, and verified the ac-

curacy and the efficiency of our methods.

Search WWH ::

Custom Search

Home