Databases Reference
In-Depth Information
If we have to improve on parallelism, we will have to divide the results of the mapper
based on the key on each processing node (typically multiple keys are grouped together
into partitions), then we take data from all the nodes for one partition, combine it into
a single group for that partition and send it to the reducer. Multiple reducers can then
operate on the partitions in parallel to get to the final results merged together (Figure 6-5 ).
This approach sometimes is called “shuffling” and the partitions are referred to as
“buckets” or “regions.”
Parallel Reduce Functions by Partitions
Map Function Output (By Order)
Men's Perfume
-X
280$
Men's Perfume-X
280$
Men's Perfume -X
168$
Men's Perfume -X
168$
Men's Perfume -X
56$
Shaving Cream
12$
Men's Perfume -X
28$
Shaving Cream
48$
Shaving Cream
12$
Men's Perfume -X
56$
Shaving Cream
48$
Nike Shoes
400$
Shaving Cream
24$
Nike Shoes
286$
Men's Perfume -X
28$
Shaving Cream
24$
Nike Shoes
400$
Nike Shoes
286$
Figure 6-5. Parallel reduce functions applied to customer: Order illustration
While we optimized the map-reduce function through parallel reduce functions,
we still see data being moved from node to node between the map and reduce functions.
How can we optimize the process to minimize data movements?
If you notice, much of this data is repetitive, consisting of multiple key-value pairs
for the same key. We can introduce a combiner function to the map-reduce framework,
which will combine all the data for the same key into a single value.
Not
A combiner function in essence is a reducer function.
Map-reduce framework imposes few limitations on the calculations you would
like to perform on the data. Within a map function you can only operate on a single
aggregate, and within a reduce function, you can only operate on a single key (Figure 6-6 ).
Thus, based on your requirement, you will have to design different map-reduce jobs. To
illustrate this aspect, let's look at the requirement: “What is the average ordered quantity
for each product?”
 
 
Search WWH ::




Custom Search