Database Reference
In-Depth Information
In the problems of Examples 2.13 and 2.14 , the inputs and outputs were clearly all
present. However, there are other problems where the inputs and/or outputs may not all be
present in any instance of the problem. An example of such a problem is the natural join of
R ( A, B ) and S ( B, C ) discussed in Section 2.3.7 . We assume the attributes A , B , and C each
have a finite domain, so there are only a finite number of possible inputs and outputs. The
inputs are all possible R -tuples, those consisting of a value from the domain of A paired
with a value from the domain of B , and all possible S -tuples - pairs from the domains of B
and C . The outputs are all possible triples, with components from the domains of A , B , and
C in that order. The output ( a, b, c ) is connected to two inputs, namely R ( a, b ) and S ( b, c ).
Figure 2.10 Input-output relationship for matrix multiplication
But in an instance of the join computation, only some of the possible inputs will be
present, and therefore only some of the possible outputs will be produced. That fact does
not influence the graph for the problem. We still need to know how every possible output
relates to inputs, whether or not that output is produced in a given instance.
2.6.4
Mapping Schemas
Now that we see how to represent problems addressable by MapReduce as graphs, we can
define the requirements for a MapReduce algorithm to solve a given problem. Each such
algorithm must have a mapping schema , which expresses how outputs are produced by the
various reducers used by the algorithm. That is, a mapping schema for a given problem
with a given reducer size q is an assignment of inputs to one or more reducers, such that:
(1) No reducer is assigned more than q inputs;
(2) For every output of the problem, there is at least one reducer that is assigned all the
inputs that are related to that output. We say this reducer covers the output.
Search WWH ::




Custom Search