Databases Reference
In-Depth Information
Map Function
To tal Price: 400$
Nike
Shoes
Quantity: 1
To tal Price: 12$
Shaving
Cream
Quantity: 1
To tal Price: 56$
Men's
Perfume
-X
Quantity: 2
Figure 6-3. Map function applied to customer: Order illustration
In order to provide high parallelization, map functions operate on a single record
independent of all others. The reduce function (Figure 6-4 ) takes multiple map outputs
with the same key and combines their values to arrive at the final result.
Men's Perfume -X
To tal Price: 280$
Order ID: 1
Reduce Function
Quantity: 10
To tal Price: 504$
Men's Perfume -X
To tal Price: 168$
Quantity: 18
Order ID: 2
Quantity: 6
To tal Price: 56$
Order ID: X
Quantity: 2
Figure 6-4. Reduce function applied to customer: Order illustration
The map-reduce framework arranges for map tasks to be run on the correct nodes
to process all the data sets and for the data to be moved to the reduce function. In its
simplest form, you can think of the map-reduce job having a single reduce function, the
outputs from all the map tasks running on various nodes are then aggregated together
and sent to the reduce function.
What optimization options do we have for the map-reduce framework? Each reduce
function operates on the results of a single key, so this limits the performance; you can't
do anything in the reduce function to make it operate across keys. On the other hand this
limitation is actually a good thing; it allows you to run multiple reducers in parallel.
 
Search WWH ::




Custom Search