Database Reference
In-Depth Information
Computing aims to offer affordable and scalable computing capacities, which
meet the needs of MapReduce applications. However, because of the lack of se-
curity mechanisms to ensure data privacy provided by Cloud providers, users
are still reluctant to ooad the processing of their sensible data-sets.
Desktop Grids (DG) [CF12] are a form of volunteer computing that have
known success thanks to the high computing and storage power they offer with
a low economic cost. The architecture of this infrastructure is based on the
federation of free resources; users, voluntarily, participate with their machines
when these are idle. Volatility and security are ones of the constraints that
discourage users to exploit this enormous potential.
Our contribution is to enhance MapReduce security, so that it protects data
sent by the users to remote computing infrastructures from leakage and eaves-
dropping. More specifically users face two kinds of threats : 1) during data dis-
tribution, an eavesdropper could intercept data when being transferred, and 2)
when stored or processed, a malicious workers could have access to data. Unfor-
tunately, if encryption can protect data transfer and storage, it cannot prevent
the spying of data when they are deciphered for computation. There exists tech-
niques that allow to process encrypted data, however, those are not yet generic
enough for supporting any kind of computation.
As MapReduce is based on parallel processing, data has to be divided over
the computing nodes so each one processes a chunk as an input file. To improve
data privacy, our approach is to use a combination of trusted and untrusted
infrastructures, for instance private and public Clouds, to store the data set and
execute the MapReduce applications. Our approach relies on the Information
Dispersal Algorithm (IDA) to split and distribute the data.
Our idea is to break data into meaningless chunks so that a malicious worker
or eavesdropper, can not get access to meaningful data. A meaningless data is
an obsolete and useless information so even if a malicious worker has access
to it, the data (i.e the meaningfull) remains protected. To do so, we use IDA
which generates, from an input file, several chunks and disperses them on several
machines. Each machine aiming to access data has to contact other machines
to get missing chunks to reconstruct the needed information. In our case, we
call chunk provided by IDA: meaningless data. So, if a malicious node has 1
chunk, it has to contact and collaborate with other nodes to get missing ones.
The lack of one chunk prevents the malicious user to get access to meaningful
data. In order to hide some chunks from malicious users, we use a hybrid cloud
infrastructure. If m chunks are necessary to reconstruct data, we deploy m −
1
chunks on untrusted infrastructure, such as public cloud and desktop grid. The
remaining chunks are deployed on a private cloud. We assume that a private
cloud is highly secure and cannot be accessible by malicious users.
The rest of the paper is organized as follows. Section 2 presents the dispersion
algorithm IDA and MapReduce. In Section 3, we describe our approach with
its various components. Section 4 analyzes the experiments results. Section 5
exposes related works. Finally, we conclude in section 6.
 
Search WWH ::




Custom Search