Database Reference
In-Depth Information
Note that the output is not exactly a relation, because it has key-value pairs. However, a
relation can be obtained by using only the value components (or only the key components)
of the output.
2.3.5
Computing Projections by MapReduce
Projection is performed similarly to selection, because projection may cause the same tuple
to appear several times, the Reduce function must eliminate duplicates. We may compute
π S ( R ) as follows.
x For each tuple t in R , construct a tuple t ′ by eliminating from t those components whose
attributes are not in S . Output the key-value pair ( t , t ′).
The Reduce Function For each key t ′ produced by any of the Map tasks, there will be one
or more key-value pairs ( t , t ′). The Reduce function turns ( t , [ t , t , . . . , t ′]) into ( t , t ′), so
it produces exactly one pair ( t , t ′) for this key t ′.
Observe that the Reduce operation is duplicate elimination. This operation is associative
and commutative, so a combiner associated with each Map task can eliminate whatever du-
plicates are produced locally. However, the Reduce tasks are still needed to eliminate two
identical tuples coming from different Map tasks.
2.3.6
Union, Intersection, and Difference by MapReduce
First, consider the union of two relations. Suppose relations R and S have the same schema.
Map tasks will be assigned chunks from either R or S ; it doesn't matter which. The Map
tasks don't really do anything except pass their input tuples as key-value pairs to the Re-
duce tasks. The latter need only eliminate duplicates as for projection.
x Turn each input tuple t into a key-value pair ( t, t ).
The Reduce Function Associated with each key t there will be either one or two values.
Produce output ( t, t ) in either case.
To compute the intersection, we can use the same Map function. However, the Reduce
function must produce a tuple only if both relations have the tuple. If the key t has a list of
two values [ t, t ] associated with it, then the Reduce task for t should produce ( t, t ). However,
if the value-list associated with key t is just [ t ], then one of R and S is missing t , so we don't
want to produce a tuple for the intersection.
x Turn each tuple t into a key-value pair ( t, t ).
Search WWH ::




Custom Search