Database Reference
In-Depth Information
evolution when humans gain or lose weight. Hence, a noise model can
be constructed that exports realistic-looking parameters for both the
direction and time-constant of weight changes. The resulting perturbed
stream can be aggregated with that of others in the community. Since
the distributions of noise model parameters are statistically known, it is
possible to estimate the sum, average and distribution of added noise (of
the entire community) as a function of time. Subtracting that known
average noise time series from the sum of perturbed community curves
will thus yield the true community trend. The distribution of community
data at a given time can similarly be estimated (using de-convolution
methods) since the distribution of noise (i.e., data from virtual users) is
known. The estimate improves with community size.
The approach preserves individual user privacy while allowing accu-
rate reconstruction of community statistics. Several research questions
arise that require additional work. For example, what is a good up-
per bound on the reconstruction error of the data aggregation result as
a function of the noise statistics introduced to perturb the individual
inputs? What are noise generation techniques that minimize the for-
mer error (to achieve accurate aggregation results) while maximizing the
noise (for privacy)? How to ensure that data of individual data streams
cannot be inferred from the perturbed signal? What are some bounds
on minimum error in reconstruction of individual data streams? What
noise generation techniques maximize such error for privacy? Privacy
challenges further include the investigation of attack models involving
corrupt noise models (e.g., ones that attempt to deceive non-expert users
into using perturbation techniques that do not achieve adequate privacy
protection), malicious clients (e.g., ones that do not follow the correct
perturbation schemes or send bogus data), and repeated server queries
(e.g., to infer additional information about evolution of client data from
incremental differences in query responses). For example, given that it
is fundamentally impossible to tell if a user is sharing a properly per-
turbed version of their real weight or just some random value, what
fractions of malicious users can be accommodated without significantly
affecting reconstruction accuracy of community statistics? Can damage
imposed by a single user be bounded using outlier detection techniques
that exclude obviously malicious users? How does the accuracy of out-
lier detection depend on the scale of allowable perturbation? In general,
how to quantify the tradeoff between privacy and robustness to malicious
user data? How tolerant is the perturbation scheme to collusion among
users that aims to bias community statistics? Importantly, how does the
time-series nature of data affect answers to the above questions com-
Search WWH ::




Custom Search