Top-k Typicality Queries on Uncertain Data - Ranking Queries on Uncertain Data

Database Reference

In-Depth Information

throws or assists , which are the skills popular in guards, but may not be common in

other players.

In Table 4.5, we list the answers to the top-10 representative typicality query

on guards. Comparing to the answers to a top-10 simple typicality query listed in

Table 4.5, the top-10 representatively typical guards are quite different from each

other in 3 point throws and assists. For example, Ronald Murray , the most typical

guard, represents the NBA guards who are experienced and perform well, while

Andre Owens , the second most representatively typical guard, represents a group of

NBA guards whose performances are relatively poorer.

We use the NBA data set to examine the differences among medians, means, and

typical instances. The results are shown in Table 4.6. The simple typicality scores of

the medians and the means are often substantially lower than the most typical play-

ers, which justifies that the geometric centers may not reflect the probability density

distribution. A typical player can be very different from the median player and the

mean. For example, Ronald Murray is identified as the most typical guard, but Char-

lie Bell is the median guard. The technical statistics show that Murray makes fewer

rebounds than Bell, but contributes more assists. To this extent, Murray is more typ-

ical than Bell as a guard. Moreover, Ronald Murray played 76 games in the season,

while Charlie Bell only played 59 games. If we take the range 76

, then

there are 92 guards whose numbers of games played are in the range; while there are

only 31 guards whose numbers of games played are in the range 59

±

6

=[

70

,

82

]

±

6

=[

53

,

65

]

.

That is, much more guards played a similar number of games as Murray.

To compare the difference between typicality analysis and clustering analysis, we

compute 2 clusters of all guards using the k -medoids clustering algorithm [83]. The

median players of clusters are

{

Ronald Murray, Stephon Marbury

}

, whose group

typicality score is 0

105. The group typicality score of the top-2 most representa-

tively typical guards in Table 4.5 (i.e.,

.

161.

The set of players found by the clustering analysis is only 65% as representative as

the set of players found by the top- k representative typicality queries.

{

Ronald Murray, Andre Owens

}

)is0

.

4.5.2 Approximation Quality

To evaluate the query answering quality on uncertain data with large number of in-

stances, we use the Quadraped Animal Data Generator also from the UCI Machine

Learning Database Repository to generate synthetic data sets with up to 25 numeric

attributes. We test the approximation quality of the RT (randomized tournament)

method, the DLTA (direct local typicality approximation) method, and the LT3 (lo-

cal typicality approximation using tournaments) method on top- k simple typicality

queries, top- k discriminative typicality queries, and top- k representative typicality

queries, respectively. The results are reported in the rest of this section.

First of all, we test RT, DLTA, and LT3 for top- k simple typicality queries. To

measure the error made by an approximation algorithm, we use the following error

rate measure. For a top- k typicality query Q , let A be the set of k instances returned

Ranking Queries on Uncertain Data

Search WWH ::

Custom Search

Home