Database Reference
In-Depth Information
was in the shopping mall, then in the park, and then at the train station. The
attacker could employ such knowledge to retrieve the complete trajectory of P
in the released data set: this attempt would succeed, provided that the attacker
knows that P 's trajectory is actually present in the data set, if the known tra-
jectory is compatible with (i.e., is a subtrajectory of) just one trajectory in the
data set. In this example of a linkage attack in the movement data domain, the
subtrajectory known by the attacker serves as QI, while the entire trajectory is
the PI that is disclosed after the reidentification of the respondent. Clearly, as
the example suggests, is rather difficult to distinguish QI and PI: in principle,
any specific location can be the theater of a shadowing action by a spy, and
therefore any possible sequence of locations can be used as a QI, that is, as a
means for reidentification. Put another way, distinguishing between QI and PI
among the locations means putting artificial limits on the attacker's background
knowledge; on the contrary, it is required in privacy and security research to
have assumptions on the attacker's knowledge that are as liberal as possible, in
order to achieve maximal protection.
As a consequence of this discussion, it is reasonable to consider the radical
assumption that any (sub)trajectory that can be linked to a small number of
individuals is a potentially dangerous QI and a potentially sensitive PI. Therefore,
in the trajectory linkage attack , the malicious party M knows a subtrajectory of
a respondent R (e.g., a sequence of locations where R has been spied on by M )
and M would like to identify in the data the whole trajectory belonging to R ,
that is, learn all places visited by R .
Privacy-Preserving Techniques
How is it possible to guarantee that the probability of success of the above attack
is very low while preserving the utility of the data for meaningful analyses?
Consider the source trajectories represented in Figure 9.4 , obtained from a
massive data set of GPS traces (17,000 private vehicles tracked in the city of
Milan, Italy, during a week).
Each trajectory is a deidentified sequence of timestamped locations, visited
by one of the tracked vehicles. Albeit deidentified, each trajectory is essen-
tially unique - very rarely are two different trajectories exactly the same given
the extremely fine spatio-temporal resolution involved. As a consequence, the
chances of success for the trajectory linkage attack are not low. If the attacker
M knows a sufficiently long subsequence S of locations visited by the respon-
dent R , it is possible that only a few trajectories in the data set match with S ,
possibly just one. Indeed, publishing raw trajectory data such as those depicted
in Figure 9.4 is an unsafe practice, which runs a high risk of violating the pri-
vate sphere of the tracked drivers (e.g., guessing the home place and the work
place of most respondents is very easy). Now, assume that one wants to dis-
cover the trajectory clusters emerging from the data through data mining, that
Search WWH ::




Custom Search