Database Reference
In-Depth Information
Definition 2.7 (Representing region). Given an uncertain object O on attributes
A 1 ,···,
A n and a subset of instances A
O , let D
=
D A 1 ×···×
D A n where D A i is the
domain of attribute A i (1
i
n ), the representing region of an instance o
A is
D
(
o
,
A
)= {
x
|
x
D
,
d
(
x
,
o
)=
min y A d
(
x
,
y
) }
, where d
(
x
,
y
)
is the distance between
objects x and y .
To make A representative as a whole, the representing region of each instance o
in A should be fairly large and o should be typical in its own representing region.
Definition 2.8 (Group typicality). Given an uncertain object O on attributes
A 1 ,···,
be the n -dimensional random vec-
tor generating the instances in O , the group typicality of A on attributes A i 1 ,···,
A n and a subset of instances A
O , let
X
A i l
(1
i j
n ,1
j
l )is GT
(
A
,X )= o A T
(
o
,X D ( o , A ) ) ·
Pr
(
D
(
o
,
A
))
, where
T
(
o
,X D ( o , A ) )
is the simple typicality of o with respect to
X
in o 's representing
region D
(
o
,
A
)
and Pr
(
D
(
o
,
A
))
is the probability of D
(
o
,
A
)
.
Since the distribution of
X
is unknown, we can estimate the group typical-
ity GT
(
A
,X )
as follows. For any instance o
A , let N
(
o
,
A
,
O
)= {
x
|
x
O
D
(
o
,
A
) }
be the set of instances in O that lie in D
(
o
,
A
)
, Pr
(
D
(
o
,
A
))
can be esti-
mated using | N ( o , A , O ) |
| O |
. The group typicality GT
(
A
,X )
is estimated by GT
(
A
,
O
)=
)) · | N ( o , A , O ) |
| O |
o A T
(
o
,
N
(
o
,
A
,
O
, where T
(
o
,
N
(
o
,
A
,
O
))
is the estimator of simple
(
,X D ( o , A ) )
(
,
,
)
typicality T
o
, since N
o
A
O
can be viewed as a set of independent and
identically distributed samples of
.
The group typicality score measures how representative a group of instances is.
The size-k most typical group problem is to find k instances as a group such that the
group has the maximum group typicality. Unfortunately, the problem is NP-hard,
since it has the discrete k -median problem as a special case, which was shown to be
NP-hard [28].
Moreover, top- k queries are generally expected to have the monotonicity in an-
swer sets. That is, the result of a top- k query is contained in the result of a top- k
query where k
X
that lie in D
(
o
,
A
)
k . However, an instance in the most typical group of size k may
not be in the most typical group of size k ( k
<
k ). For example, in the data set illus-
trated in Figure 2.2, the size-1 most typical group is
<
{
A
}
and the size-2 most typical
group is
, which does not contain the size-1 most typical group. Therefore, the
size- k most typical group is not suitable to define the top- k representative typicality.
To enforce monotonicity, we adopt a greedy approach.
{
B
,
C
}
Definition 2.9 (Representative typicality). Given an uncertain object O and a re-
ported answer set A
be the random vector with respect to instances
in O , the representative typicality of an instance o
O , let
X
(
O
A
)
is RT
(
o
,
A
,X )=
(
∪{
},X )
(
,X )
(
∪{
},X )
(
,X )
GT
A
o
GT
A
, where GT
A
o
and GT
A
are the
∪{
}
group typicality values of subsets A
o
and A , respectively.
In practice, we use RT
(
o
,
A
,
O
)=
GT
(
A
∪{
o
},
O
)
GT
(
A
,
O
)
to estimate
RT
(
o
,
A
,X )
, where GT
(
A
,
O
)
and GT
(
A
∪{
o
},
O
)
are the estimators of GT
(
A
,X )
and GT
(
A
∪{
o
},X )
, respectively.
 
Search WWH ::




Custom Search