Database Reference
In-Depth Information
3.5 Requirements
- Ratio between n and m : As m is the number of chunks necessary and
su cient to reconstruct the input file, it is advantageous to maximize it. On
the other hand, this ratio ensures that the size of manipulated data is close
to the original file size, as extracting n chunks of length |F|/m each, from a
file size |F| gives a percentage of (( n/m∗|F| ) −|F| ) 100 /|F| of redundancy.
Otherwise, with n =2 m , we will double (100%) the source file to generate
the n files to scatter.
In our approach, 2 n∗
1) messages are exchanged. Therefore, the choice
of parameters n and m has impact on the performance of our approach.
( m−
if m n : we reduce communications but we weaken the security and
considerably increase redundancy.
if m ≈ n : we provide better security, we maintain an acceptable level of
redundancy but we increase communications between mappers.
- Mappers allocation: Threats can occur at the mapper itself, as a mali-
cious one, or during communications when intruders hearken the network. A
malicious mapper can have access to data as it is charged to process it. This
scenario is allowed in our system. Nevertheless, when a community of, at
least, m malicious mappers cooperate to reveal their input, they may reveal
all data, not only theirs. To prevent that, we propose to use a hybrid cloud
infrastructure to deploy our solution. On each public cloud, we deploy m−
1
chunks.This number of chunks is not sucient to reconstruct all the data.
The remaining chunks will be deployed on a private cloud.
There are different scenarios when taking into account the existing Cloud
providers, their cost and, the most importantly, confidence and probable
threats that may occur to each. A first given scenario may divide data be-
tween two famous Cloud such Amazon and IBM because security techniques
are more reliable, and the cost would be relatively higher. A second scenario
would choose others less trusted Clouds, so that first the cost is lower, second,
the user may allow a given level of data visibility; i.e the number of even-
tual untrusted mappers. The user could choose between different scenarios
according to his application and his data requirements.
4 Experiments and Evaluation
We have implemented our approach in Perl, to manage communication between
mappers and reducers and we have used Crypt-IDA 1 library, which is an imple-
mentation of IDA in perl.
We realized a set of experiments on the Grid'5000 platform using 220 machines
on the Nancy site.
In order to evaluate the performance of our system, we chose to evaluate
the phases according to their locality of execution, the first two phases, Split
and Scatter (step 2S), being executed by the master, the two last Collect and
1 http://search.cpan.org/~dmalone/Crypt-IDA-0.01/lib/Crypt/IDA.pm
 
Search WWH ::




Custom Search