Databases Reference
In-Depth Information
total region monitored by the LTS. At each iteration the current area q prev
is partitioned into quadrants of equal size. If less than k users are located
in the quadrant q where the issuer of the request is located, then q prev is
returned. Otherwise, iteration continues considering q as the next area. In
order to evaluate the time complexity of the algorithm it is necessary to make
some assumptions about the data structures. In our implementation of the
algorithm, we used a data structure consisting of a quadTree in which each
leaf has a pointer to a user, and each internal node n stores the number of
users “contained” in n i.e., the number of users stored in the subtree that
has n as root. The generalization algorithm traverses the quadTree from the
root to the first internal node that contains at least k users. Each iteration is
constant time and the number of iterations is bounded by the height of the
quadTree. In the worst case, the height of the tree is linear in the cardinality
of the set I of users. However, if users are uniformly distributed (as in the case
of the experimental results that we present in Section 5) the height of the tree
is logarithmic in the number of users hence the algorithm has a worst-case
time complexity of O (log(
)).
Mokbel et al. [16] propose Casper , a framework for privacy protection that
includes a generalization algorithm. In this paper we consider the “basic” data
structure 3 used by Casper i.e., a balanced quadTree in which each node has
a pointer to its parent, and users are stored in leaf nodes only. Moreover,
the data structure consists of a table in which each user i is associated with
the leaf node that contains i . The generalization algorithm starts from the
leaf node that contains the issuer of the request, and iteratively traverses the
tree towards the root until an area that contains at least k users is found. At
each iteration, the algorithm considers the union of the area covered by the
current node n and the horizontally (vertically, resp.) contiguous area covered
by its sibling node. If only one of these two joined areas contains more than
k users, that area is returned; if both of them contain more than k users, the
one containing the minimum number is returned; otherwise, the algorithm
proceeds with the next iteration. Similarly to IntervalCloaking , the worst case
time complexity of Casper is linear in the height of the quadTree. However,
in this case, this height is bounded by the logarithm of the number of leaf
nodes if users are uniformly distributed, and it is at most linear in the same
number, otherwise.
Conceptually, one of the simplest ways to generalize a request is to compute
the k Nearest Neighbor query among the users and return the MBR of the
result. We call nnALG this generalization algorithm. The problem of this
approach is that in most cases the issuer of the request is located close to
the center of the resulting area; hence, he can be easily discovered by the
attacker [11]. To partially overcome this problem, Kalnis et al. [11] propose
the nnASR generalization algorithm that picks a random user i in the set of
|
I
|
3 For the purpose of this paper, there is no need to consider the “adaptive” data
structure proposed in the paper.
Search WWH ::




Custom Search