Database Reference
In-Depth Information
there are. So he asks,“
Who are the top-3 typical guards in whole representing
different types of guards?
”
Simple typicality does not provide the correct answer to this question, since the
3
players with the greatest simple typicality may be quite similar to each other,
while some other popular players different from those three may be missed. Dis-
criminative typicality does not help either, because the exact types of guards and
their members are unknown.
To solve this problem, we develop the notion of
representative typicality
that mea-
sures how an instance is typical in an uncertain object different from the already
reported typical instances. Given an uncertain object O, a top-k representative
typicality query finds a set of k instances of O with the highest representative
typicality scores.
By default, we consider an uncertain object
O
on attributes
A
1
,···,
A
n
. Let
A
i
1
,···,
≤
≤
A
i
l
be the attributes on which the typicality queries are applied (1
i
j
n
for 1
≤
j
≤
l
) and
d
A
i
1
,···,
A
i
l
(
,
)
x
y
be the distance between two instances
x
and
y
in
S
on attributes
A
i
1
,···,
A
i
l
. When
A
i
1
,···,
A
i
l
are clear from context,
d
A
i
1
,···,
A
i
l
(
x
,
y
)
is
abbreviated to
d
.
We address the top-
k
typicality problem in a generic metric space. Therefore, the
distance metric
d
should satisfy the triangle inequality.
(
x
,
y
)
2.2.1.1 Simple Typicality
By intuition and as also suggested by the previous research in psychology and cogni-
tive science (as will be reviewed in Section 3.3.1), an instance
o
in
O
is more typical
than the others if
o
is more likely to appear in
O
. As discussed in Section 2.1.1, the
set of instances in
O
on attributes
A
1
,···,
A
n
can be viewed as a set of independent
and identically distributed samples of an
n
-dimensional random vector
X
that takes
values in the Cartesian product space
D
=
D
A
1
×···×
D
A
n
, where
D
A
i
is the domain
of attribute
A
i
(1
≤
i
≤
n
). The likelihood of
o
∈
O
, given that
o
is a sample of
X
,
can be used to measure the typicality of
o
.
Definition 2.5 (Simple typicality).
Given an uncertain object
O
on attributes
A
1
,···,
A
n
and a subset of attributes
A
i
1
,···,
A
i
l
(1
≤
i
j
≤
n
for 1
≤
j
≤
l
) of in-
terest, let
be the
n
-dimensional random vector generating the instances in
O
, the
simple typicality
of an instance
o
X
∈
O
with respect to
X
on attributes
A
i
1
,···,
A
i
l
is defined as
T
A
i
1
,···,
A
i
l
(
o
,X
)=
L
A
i
1
,···,
A
i
l
(
o
|X
)
where
L
A
i
1
,···,
A
i
l
(
o
|X
)
is the like-
lihood [27] of
o
on attributes
A
i
1
,···,
A
i
l
, given that
o
is a sample of
S
.
In practice, since the distribution of random vector
X
is often unknown, we
use
T
A
i
1
,···,
A
i
l
(
o
,
O
)=
L
A
i
1
,···,
A
i
l
(
o
|
O
)
as an estimator of
T
A
i
1
,···,
A
i
l
(
o
,X
)
, where
L
A
i
1
,···,
A
i
l
(
o
|
O
)
is the posterior probability of an object
o
on attributes
A
i
1
,···,
A
i
l
given
O
[27].
L
A
i
1
,···,
A
i
l
(
can be computed using density estimation methods. We adopt the
commonly used kernel density estimation method, which does not require any distri-
bution assumption on
O
. The general idea is to use a kernel function to approximate
o
|
O
)