Database Reference
In-Depth Information
rounds of validations, the quality of randomized tournament may increase, but after
3 rounds, the quality improvement is very small.
Figure 4.4(a) shows the change of approximation quality with respect to the
neighborhood threshold. In the figure, the error bounds given by Theorems 4.1
and 4.3 are also plotted, which are labeled as UB
, respec-
tively. To make the curves legible, the error rates are in the logarithmic scale.
Clearly, the larger the neighborhood threshold, the more accurate the local typical-
ity approximation. Our methods perform much better than the error bounds, which
shows that they are effective in practice.
In Figure 4.4(b), we vary the value of k in the top- k queries. The approximation
quality of RT is not sensitive to k , since it runs k times to select the top- k answers.
Both DLTA and LT3 see a larger error rate with a larger value of k , this is because
the distant neighbors may get a better chance to play a role in typicality when more
instances are returned.
Figure 4.4(c) shows the impact of dimensionality on the error rate. DLTA
achieves the best approximation quality, the error rate is up to 0
(
DLTA
)
and UB
(
LT 3
)
066%. LT3 has
an accuracy close to DLTA, and is much better than RT. The error rate decreases
as the dimensionality increases, since the average pairwise distance among the in-
stances in the object also increases and the local typicality approximation becomes
more effective.
Figure 4.4(d) tests the approximation quality versus the number of instances.
When the cardinality increases, the instances becomes denser, and the local typical-
ity approximation is more accurate. That is why LT3 and DLTA perform better with
larger number of instances. However, the approximation quality of RT decreases in
large number of instances, since with a fixed tournament group size, the larger the
number of instances, the more likely the most typical instance in a random group is
biased.
In summary, DLTA and LT3 both achieve better accuracy than RT, which strongly
justifies the effectiveness of our local typicality approximation technique. The accu-
racy of LT3 is slightly lower than DLTA, but as we will show in Section 4.5.4, LT3
is much more efficient than DLTA.
We also report the approximation quality of top- k discriminative typicality query
answering algorithms. By default, the data contains 10
.
000 instances with 5 at-
tributes. We randomly assign 20% of the instances into the target object O , and
conduct top-10 discriminative typicality queries. The neighborhood threshold
,
of
DLTA and LT3 is set to 2 h , where h is the bandwidth of Gaussian kernels. The group
size of randomized tournament is set to 50, and 4 validations are conducted. Here we
increase the tournament size to 50, since only 20% instances are actually involved
in the tournament, as we explained in Section 4.3.1.
The error rate measure is defined as follows. We normalize DT
σ
(
o
,
O
,
S
O
)
as
DT (
o
,
C
,
S
C
)=
DT
(
o
,
O
,
S
O
)
min x C DT
(
x
,
O
,
S
O
)
, in order to make the
DT
value always non-negative. For a top- k discriminative typicality
query Q , let A be the set of k instances returned by the exact algorithm, and A be the
set of k instances returned by an approximation algorithm. Then, the error rate e is
(
o
,
O
,
S
O
)
Search WWH ::




Custom Search