Towards Privacy for MapReduce on Hybrid Clouds Using Information Dispersal Algorithm - Data Management in Cloud, Grid and P2P Systems

Database Reference

In-Depth Information

2.2 MapReduce

The principle of MapReduce consists of splitting the data into parts so their

process is simultaneously done. The model defines two main functions, the Map

function processes input chunks and the Reduce function processes the output

of Map tasks and outputs final results.

Computing elements can be classified into Mapper nodes, which execute Map

tasks and Reducers which execute Reduce tasks. In a first step, input data are

divided into chunks and distributed over the Mappers. Then, Mappers apply

the Map function on each chunk. The result of the execution of a Map task is

list(k,v), a list of key and value pairs. An intermediate Shue phase sorts the

map outputs, called intermediate results, according to keys so that in a second

step, each Reducer processes a set of the keys. In the Reduce phase, Reducers

apply the Reduce function to all of the values (k, list(v)) for a specific key. At

the end, all the results can be assembled and sent back to the user.

3 Our Secure MapReduce Approach Based on IDA

Our approach is composed of four phases, illustrated in Figure 1 as follows:

Fig. 1. IDA Phases for MapReduce

3.1 Preparation Phase (Split)

Classical MapReduce master splits the input data into chunks based on the

chunk size. In our approach, we apply the IDA split routine on input data which

generates n chunks. A chunk M i is composed of 1) a header containing the key

vector of order i (i.e. the i th row of the key matrix), 2) a body storing the i th

row of the key matrix and data product. All generated chunks will be dispersed

to the mappers machines.

Search WWH ::

Custom Search

Home