Database Reference
In-Depth Information
Definition 2.7 (Representing region).
Given an uncertain object
O
on attributes
A
1
,···,
A
n
and a subset of instances
A
⊂
O
, let
D
=
D
A
1
×···×
D
A
n
where
D
A
i
is the
domain of attribute
A
i
(1
≤
i
≤
n
), the
representing region
of an instance
o
∈
A
is
D
(
o
,
A
)=
{
x
|
x
∈
D
,
d
(
x
,
o
)=
min
y
∈
A
d
(
x
,
y
)
}
, where
d
(
x
,
y
)
is the distance between
objects
x
and
y
.
To make
A
representative as a whole, the representing region of each instance
o
in
A
should be fairly large and
o
should be typical in its own representing region.
Definition 2.8 (Group typicality).
Given an uncertain object
O
on attributes
A
1
,···,
be the
n
-dimensional random vec-
tor generating the instances in
O
, the
group typicality
of
A
on attributes
A
i
1
,···,
A
n
and a subset of instances
A
⊂
O
, let
X
A
i
l
(1
≤
i
j
≤
n
,1
≤
j
≤
l
)is
GT
(
A
,X
)=
∑
o
∈
A
T
(
o
,X
D
(
o
,
A
)
)
·
Pr
(
D
(
o
,
A
))
, where
T
(
o
,X
D
(
o
,
A
)
)
is the simple typicality of
o
with respect to
X
in
o
's representing
region
D
(
o
,
A
)
and
Pr
(
D
(
o
,
A
))
is the probability of
D
(
o
,
A
)
.
Since the distribution of
X
is unknown, we can estimate the group typical-
ity
GT
(
A
,X
)
as follows. For any instance
o
∈
A
, let
N
(
o
,
A
,
O
)=
{
x
|
x
∈
O
∩
D
(
o
,
A
)
}
be the set of instances in
O
that lie in
D
(
o
,
A
)
,
Pr
(
D
(
o
,
A
))
can be esti-
mated using
|
N
(
o
,
A
,
O
)
|
|
O
|
. The group typicality
GT
(
A
,X
)
is estimated by
GT
(
A
,
O
)=
))
·
|
N
(
o
,
A
,
O
)
|
|
O
|
∑
o
∈
A
T
(
o
,
N
(
o
,
A
,
O
, where
T
(
o
,
N
(
o
,
A
,
O
))
is the estimator of simple
(
,X
D
(
o
,
A
)
)
(
,
,
)
typicality
T
o
, since
N
o
A
O
can be viewed as a set of independent and
identically distributed samples of
.
The group typicality score measures how representative a group of instances is.
The
size-k most typical group problem
is to find
k
instances as a group such that the
group has the maximum group typicality. Unfortunately, the problem is NP-hard,
since it has the discrete
k
-median problem as a special case, which was shown to be
NP-hard [28].
Moreover, top-
k
queries are generally expected to have the monotonicity in an-
swer sets. That is, the result of a top-
k
query is contained in the result of a top-
k
query where
k
X
that lie in
D
(
o
,
A
)
k
. However, an instance in the most typical group of size
k
may
not be in the most typical group of size
k
(
k
<
k
). For example, in the data set illus-
trated in Figure 2.2, the size-1 most typical group is
<
{
A
}
and the size-2 most typical
group is
, which does not contain the size-1 most typical group. Therefore, the
size-
k
most typical group is not suitable to define the top-
k
representative typicality.
To enforce monotonicity, we adopt a greedy approach.
{
B
,
C
}
Definition 2.9 (Representative typicality).
Given an uncertain object
O
and a
re-
ported answer set
A
be the random vector with respect to instances
in
O
, the
representative typicality
of an instance
o
⊂
O
, let
X
∈
(
O
−
A
)
is
RT
(
o
,
A
,X
)=
(
∪{
},X
)
−
(
,X
)
(
∪{
},X
)
(
,X
)
GT
A
o
GT
A
, where
GT
A
o
and
GT
A
are the
∪{
}
group typicality values of subsets
A
o
and
A
, respectively.
In practice, we use
RT
(
o
,
A
,
O
)=
GT
(
A
∪{
o
},
O
)
−
GT
(
A
,
O
)
to estimate
RT
(
o
,
A
,X
)
, where
GT
(
A
,
O
)
and
GT
(
A
∪{
o
},
O
)
are the estimators of
GT
(
A
,X
)
and
GT
(
A
∪{
o
},X
)
, respectively.