Database Reference
In-Depth Information
Towards Privacy for MapReduce on Hybrid
Clouds Using Information Dispersal Algorithm
Asma Ben Cheikh 1 , Heithem Abbes 1 , and Gilles Fedak 2
1 Universite de Tunis, LaTICE, ENSIT, 5 Avenue Taha Hussein, Tunis, Tunisie
{asma.bencheik,heithem.abbes}@latice.rnu.tn
2 Universit´edeLyon,INRIA,France
gilles.fedak@inria.fr
Abstract. MapReduce is a powerful model for parallel data processing.
The motivation of this work is to allow running map-reduce jobs par-
tially on untrusted infrastructures, such as public clouds and desktop
grid, while using a trusted infrastructure, such as private cloud, to en-
sure that no outsider could get the 'entire' information. Our idea is to
break data into meaningless chunks and spread them on a combination
of public and private clouds so that the compromise would not allow the
attacker to reconstruct the whole data-set. To realize this, we use the
Information Dispersion Algorithms (IDA), which allows to split a file
into pieces so that, by carefully dispersing the pieces, there is no method
for a single node to reconstruct the data if it cannot collaborate with
other nodes. We propose a protocol that allows MapReduce computing
nodes to exchange the data and perform IDA-aware MapReduce compu-
tation. We conduct experiments on the Grid'5000 testbed and report on
performance evaluation of the prototype.
1 Introduction
MapReduce is a powerful parallel data processing model, capable of simplify the
programing of data-intensive applications, i.e applications manipulating an enor-
mous amount of data at large scale. Recently, many organizations have adopted
the MapReduce model and have implemented their own frameworks such as
Google MapReduce [DG04], Yahoo! Hadoop [Whi09] and BitDew [TMC + 10].
Furthermore, this model has been adopted by many researchers in high perfor-
mance computing, data intensive scientific analysis, large scale semantic anno-
tation and machine learning.
A large class of data processing systems using MapReduce is mainly run-
ning on local platform, such as clusters. However, open systems such as Service
Oriented Architecture, Grid Computing, Volunteer Computing and Cloud Com-
puting offer platforms to run MapReduce applications on. In particular, Cloud
This work is supported by the French Agence Nationale de la Recherche through
the MapReduce grant under contract ANR-10-SEGI- 001-01, as well as INRIA ARC
BitDew.
 
Search WWH ::




Custom Search