Data Modeling Approaches for Big Data and Analytics Solutions - Big Data Imperatives

Databases Reference

In-Depth Information

Map Function

To tal Price: 400$

Nike

Shoes

Quantity: 1

To tal Price: 12$

Shaving

Cream

Quantity: 1

To tal Price: 56$

Men's

Perfume

-X

Quantity: 2

Figure 6-3. Map function applied to customer: Order illustration

In order to provide high parallelization, map functions operate on a single record

independent of all others. The reduce function (Figure 6-4 ) takes multiple map outputs

with the same key and combines their values to arrive at the final result.

Men's Perfume -X

To tal Price: 280$

Order ID: 1

Reduce Function

Quantity: 10

To tal Price: 504$

Men's Perfume -X

To tal Price: 168$

Quantity: 18

Order ID: 2

Quantity: 6

To tal Price: 56$

Order ID: X

Quantity: 2

Figure 6-4. Reduce function applied to customer: Order illustration

The map-reduce framework arranges for map tasks to be run on the correct nodes

to process all the data sets and for the data to be moved to the reduce function. In its

simplest form, you can think of the map-reduce job having a single reduce function, the

outputs from all the map tasks running on various nodes are then aggregated together

and sent to the reduce function.

What optimization options do we have for the map-reduce framework? Each reduce

function operates on the results of a single key, so this limits the performance; you can't

do anything in the reduce function to make it operate across keys. On the other hand this

limitation is actually a good thing; it allows you to run multiple reducers in parallel.

Search WWH ::

Custom Search

Home