Security in Big Data and Cloud Computing - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

challenge and responses makes the protocol efficient. However, PDP in its original

form does not work efficiently for dynamic data.

Another similar approach is based on insertion of sentinels or special markers

inside the stored file. In this Proof of Retrievability (POR) approach [20], clients

can send small challenges for file blocks and the presence of unmodified sentinels

provide a probabilistic guarantee about the integrity of files.

19.3.4 C onFiDentiality oF D ata anD C omPutation

Research Question 4 : How can we ensure confidentiality of data and computations

in a cloud?

Many users need to store sensitive data items in the cloud. For example, healthcare

and business data needs extra protection mandated by many government regulations. But

storing sensitive and confidential data in an untrusted third-party cloud provider expose

the data to both the cloud and malicious intruders who have compromised the cloud.

Encryption can be a simple solution for ensuring confidentiality of data sent to a

cloud. However, encryption comes at a cost—searching and sorting encrypted data

is expensive and reduces performance. A potential solution is to use homomorphic

encryption for computation on encrypted data in a cloud. However, homomorphic

encryption is very inefficient, and to this day, no practical homomorphic encryption

schemes have been developed.

19.3.5 P rivaCy

Research Question 5 : How do we perform outsourced computation while guaran-

teeing user privacy [28]?

For Big Data sets of very large scale, often clients or one-time users of such

data sets do not have the capability to download the data to their own systems. A

very common technique is to divide the system into data provider (which has the

data objects), computation provider (which provides the code), and a computational

platform (such as a MapReduce framework where the code will be run on the data).

However, for data sets containing personal information, a big challenge is to prevent

unauthorized leaks of private information back to the clients.

As an example, suppose that a researcher wants to run an analysis on the medical

records of 100,000 patients of a hospital. The hospital cannot release the data to the

researcher due to privacy issues, but it can make the data accessible to a trusted third-

party computational platform, where the code supplied by the researcher (computa-

tion provider) is run on the data, with the results being sent back to the researcher.

However, this model has risks—if the researcher is malicious, he can write a

code that will leak private information from the medical records directly through

the result data or via indirect means. To prevent such privacy violations, researchers

have proposed techniques that use the notion of differential privacy. For example, the

Airavat framework [28] modifies the MapReduce framework to incorporate differ-

ential privacy, thereby preventing the leakage of private information. However, the

current state-of-the-art in this area is very inefficient in terms of performance, often

causing more than 30% in overheads for privacy protection.

Search WWH ::

Custom Search

Home