Database Reference
In-Depth Information
throws or assists , which are the skills popular in guards, but may not be common in
other players.
In Table 4.5, we list the answers to the top-10 representative typicality query
on guards. Comparing to the answers to a top-10 simple typicality query listed in
Table 4.5, the top-10 representatively typical guards are quite different from each
other in 3 point throws and assists. For example, Ronald Murray , the most typical
guard, represents the NBA guards who are experienced and perform well, while
Andre Owens , the second most representatively typical guard, represents a group of
NBA guards whose performances are relatively poorer.
We use the NBA data set to examine the differences among medians, means, and
typical instances. The results are shown in Table 4.6. The simple typicality scores of
the medians and the means are often substantially lower than the most typical play-
ers, which justifies that the geometric centers may not reflect the probability density
distribution. A typical player can be very different from the median player and the
mean. For example, Ronald Murray is identified as the most typical guard, but Char-
lie Bell is the median guard. The technical statistics show that Murray makes fewer
rebounds than Bell, but contributes more assists. To this extent, Murray is more typ-
ical than Bell as a guard. Moreover, Ronald Murray played 76 games in the season,
while Charlie Bell only played 59 games. If we take the range 76
, then
there are 92 guards whose numbers of games played are in the range; while there are
only 31 guards whose numbers of games played are in the range 59
±
6
=[
70
,
82
]
±
6
=[
53
,
65
]
.
That is, much more guards played a similar number of games as Murray.
To compare the difference between typicality analysis and clustering analysis, we
compute 2 clusters of all guards using the k -medoids clustering algorithm [83]. The
median players of clusters are
{
Ronald Murray, Stephon Marbury
}
, whose group
typicality score is 0
105. The group typicality score of the top-2 most representa-
tively typical guards in Table 4.5 (i.e.,
.
161.
The set of players found by the clustering analysis is only 65% as representative as
the set of players found by the top- k representative typicality queries.
{
Ronald Murray, Andre Owens
}
)is0
.
4.5.2 Approximation Quality
To evaluate the query answering quality on uncertain data with large number of in-
stances, we use the Quadraped Animal Data Generator also from the UCI Machine
Learning Database Repository to generate synthetic data sets with up to 25 numeric
attributes. We test the approximation quality of the RT (randomized tournament)
method, the DLTA (direct local typicality approximation) method, and the LT3 (lo-
cal typicality approximation using tournaments) method on top- k simple typicality
queries, top- k discriminative typicality queries, and top- k representative typicality
queries, respectively. The results are reported in the rest of this section.
First of all, we test RT, DLTA, and LT3 for top- k simple typicality queries. To
measure the error made by an approximation algorithm, we use the following error
rate measure. For a top- k typicality query Q , let A be the set of k instances returned
Search WWH ::




Custom Search