SOCIAL SENSING - Managing and Mining Sensor Data

Database Reference

In-Depth Information

evolution when humans gain or lose weight. Hence, a noise model can

be constructed that exports realistic-looking parameters for both the

direction and time-constant of weight changes. The resulting perturbed

stream can be aggregated with that of others in the community. Since

the distributions of noise model parameters are statistically known, it is

possible to estimate the sum, average and distribution of added noise (of

the entire community) as a function of time. Subtracting that known

average noise time series from the sum of perturbed community curves

will thus yield the true community trend. The distribution of community

data at a given time can similarly be estimated (using de-convolution

methods) since the distribution of noise (i.e., data from virtual users) is

known. The estimate improves with community size.

The approach preserves individual user privacy while allowing accu-

rate reconstruction of community statistics. Several research questions

arise that require additional work. For example, what is a good up-

per bound on the reconstruction error of the data aggregation result as

a function of the noise statistics introduced to perturb the individual

inputs? What are noise generation techniques that minimize the for-

mer error (to achieve accurate aggregation results) while maximizing the

noise (for privacy)? How to ensure that data of individual data streams

cannot be inferred from the perturbed signal? What are some bounds

on minimum error in reconstruction of individual data streams? What

noise generation techniques maximize such error for privacy? Privacy

challenges further include the investigation of attack models involving

corrupt noise models (e.g., ones that attempt to deceive non-expert users

into using perturbation techniques that do not achieve adequate privacy

protection), malicious clients (e.g., ones that do not follow the correct

perturbation schemes or send bogus data), and repeated server queries

(e.g., to infer additional information about evolution of client data from

incremental differences in query responses). For example, given that it

is fundamentally impossible to tell if a user is sharing a properly per-

turbed version of their real weight or just some random value, what

fractions of malicious users can be accommodated without significantly

affecting reconstruction accuracy of community statistics? Can damage

imposed by a single user be bounded using outlier detection techniques

that exclude obviously malicious users? How does the accuracy of out-

lier detection depend on the scale of allowable perturbation? In general,

how to quantify the tradeoff between privacy and robustness to malicious

user data? How tolerant is the perturbation scheme to collusion among

users that aims to bias community statistics? Importantly, how does the

time-series nature of data affect answers to the above questions com-

Search WWH ::

Custom Search

Home