MapReduce and the New Software Stack - Mining of Massive Datasets

Database Reference

In-Depth Information

□

In the problems of Examples 2.13 and 2.14 , the inputs and outputs were clearly all

present. However, there are other problems where the inputs and/or outputs may not all be

present in any instance of the problem. An example of such a problem is the natural join of

R ( A, B ) and S ( B, C ) discussed in Section 2.3.7 . We assume the attributes A , B , and C each

have a finite domain, so there are only a finite number of possible inputs and outputs. The

inputs are all possible R -tuples, those consisting of a value from the domain of A paired

with a value from the domain of B , and all possible S -tuples - pairs from the domains of B

and C . The outputs are all possible triples, with components from the domains of A , B , and

C in that order. The output ( a, b, c ) is connected to two inputs, namely R ( a, b ) and S ( b, c ).

Figure 2.10 Input-output relationship for matrix multiplication

But in an instance of the join computation, only some of the possible inputs will be

present, and therefore only some of the possible outputs will be produced. That fact does

not influence the graph for the problem. We still need to know how every possible output

relates to inputs, whether or not that output is produced in a given instance.

2.6.4

Mapping Schemas

Now that we see how to represent problems addressable by MapReduce as graphs, we can

define the requirements for a MapReduce algorithm to solve a given problem. Each such

algorithm must have a mapping schema , which expresses how outputs are produced by the

various reducers used by the algorithm. That is, a mapping schema for a given problem

with a given reducer size q is an assignment of inputs to one or more reducers, such that:

(1) No reducer is assigned more than q inputs;

(2) For every output of the problem, there is at least one reducer that is assigned all the

inputs that are related to that output. We say this reducer covers the output.

Search WWH ::

Custom Search

Home