Database Reference
In-Depth Information
4.2 Local Typicality Approximation
While the randomized tournament method is efficient, it does not have formally
provable accuracy. Can we provide some quality guarantee and at the same time
largely retain the efficiency? In this section, we develop several heuristic local typi-
cality approximation methods. Our discussion in this section is for simple typicality.
The methods will be extended to other typicality measures later in this chapter.
4.2.1 Locality of Typicality Approximation
In Gaussian kernel estimation, given two instances a and p in an uncertain object O ,
2
2 h 2 ,
where n is the cardinality of the uncertain object O . The contribution of p decays
exponentially as the distance between a and p increases. Therefore, if p is remote
from a , p contributes very little to the simple typicality score of a .
Moreover, in a metric space, given three instances a , b and p , the triangle inequal-
d
(
a
,
p
)
e
1
nh 2
the contribution from p to T
(
a
,
S
)
, the simple typicality score of a ,is
π
ity
.
Therefore, the instances far away from a and b will have similar contributions to the
probability density values T
|
d
(
a
,
p
)
d
(
b
,
p
) | <
d
(
a
,
b
)
holds. If d
(
a
,
p
)
d
(
a
,
b
)
, then d
(
a
,
p
)
d
(
b
,
p
)
.
Based on the above observations, given an uncertain object O and a subset C
(
a
,
O
)
and T
(
b
,
O
)
O , can we use the locality to approximate the instance having the largest simple
typicality value in C ?
Definition 4.1 (Neighborhood region). Given an uncertain object O , a neighbor-
hood threshold
σ
, and a subset C
O , let D
=
D A 1 ×···×
D A n where D A i
is the
domain of attribute A i (1
i
n ), the
σ
-neighborhood region of C is defined as
o ) }≤ σ }
D
(
C
, σ)= {
o
|
o
D
,
min o C {
d
(
o
,
.
Definition 4.2 (Local simple typicality). Given an uncertain object O , a neigh-
borhood threshold
be the random vector that gen-
erates samples O , the local simple typicality of an object o
σ
, and a subset C
O , let
X
C is defined as
LT
(
o
,
C
,X , σ)=
L
(
o
|X D ( C , σ ) )
where L
(
o
|X D ( C , σ ) )
is the likelihood of o given
that it is a sample of
X
in region D
(
C
, σ )
.
In practice, for each instance o
O , we use the set of instances in O that lie in
o 's
σ
-neighborhood region to estimate the simple typicality of o .
Definition 4.3 (Local neighborhood). Given an uncertain object O , a neighbor-
hood threshold
σ
, and a subset C
O , The
σ
-neighborhood of C is defined as
LN
(
C
,
O
, σ)= {
o
|
o
O
D
(
C
, σ ) }
, where D
(
C
, σ )
is the
σ
-neighborhood region
of C .
LN
(
C
,
O
, σ )
is the set of instances in O whose distance to at least one instance
in C is at most
σ
. Then, LT
(
o
,
C
,X , σ )
can be estimated using LT
(
o
,
C
,
O
, σ)=
 
Search WWH ::




Custom Search