Towards Privacy for MapReduce on Hybrid Clouds Using Information Dispersal Algorithm - Data Management in Cloud, Grid and P2P Systems

Database Reference

In-Depth Information

2 Background

2.1 Presentation of IDA

Information Dispersal Algorithm (IDA), first proposed by O. Rabin in 1989

[Rab89], is used to break a file F of length L = |F| into n pieces F i , i in [1 ..n ],

each of length |F i | = L/m ,sothatevery m pieces suce to rebuild F .The

sum of the lengths F i is ( n/m ) ∗ L . With IDA, since any m pieces, among the n

created, can reconstruct the data, loosing parts of data when stocking or routing

can be remedied.

Mainly, IDA is based on two key parameters. As defined, n is the total number

of pieces resulting from IDA, of which any m chunks are sucient to reconstruct

the original file. m is called ”quorum” or ”threshold”. O. Rabin defines the ratio

between m and n as follows: n/m =1+ e ( e> 0), so it ensures greater reliability

of the algorithm in terms of security and storage capacity.

IDA is based on two routines. The split operation generates the n chunks to

be distributed. The combine routine requires m chunks passed as arguments to

reconstruct the original data.

Both algorithms of division (split) and recomposition (combines) operate by

performing a multiplication of the entry and a key matrix, also called transfor-

mation matrix.

The key matrix must have the property that every subset of m different vectors

are linearly independent. If this is not the case, the key matrix can not be inverted

and the combine phase is not feasible. Indeed, the key matrix is generated from

a ”key”. It is defined as a vector V key composed of a list of elements respecting

the following constraints:

V key :( x 1 , ..., x n ,y 1 , ..., y m ),forevery i , j :

(1) x i + y j =0;(2) i = j → x i = x j et y i = y j

A row of the key matrix is:

x i + y 1

x i + y m

a i =(

, ...,

)

First, The key matrix A ( n∗m ) is generated respecting the mathematical con-

straints seen above. The next step is to decompose the input data into sequences.

Each sequence Si is expressed as a vector F i . F i is composed of m elements. An

element is of size w and can be a character ( w = 1), a word, etc.. We will call the

matrix composed of the different vectors F i ,thematrix F . Then, F is transposed

to have F t . The multiplication of A by F t generates a matrix R (IDA output).

A chunk M i is a file composed of 1) a header containing the key vector of order

a i (i.e. the i th row of the key matrix), 2) a body storing R i . M i is a chunk to

send to a mapper.

Once m chunks are collected by a host, the key matrix is obtained from the

m headers. The product result of the inverted key matrix by the content of the

chunks gives the original data.

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home