MapReduce and the New Software Stack - Mining of Massive Datasets

Database Reference

In-Depth Information

The total number of bytes communicated from Map tasks to Reduce tasks is 1,000,000 (for

the pictures) times 999,999 (for the replication), times 1,000,000 (for the size of each pic-

ture). That's 10 18 bytes, or one exabyte. To communicate this amount of data over gigabit

Ethernet would take 10 10 seconds, or about 300 years. 9

Fortunately, this algorithm is only the extreme point in a spectrum of possible al-

gorithms. We can characterize these algorithms by grouping pictures into g groups, each of

10 6 /g pictures.

The Map Function Take an input element ( i, P i ) and generate g − 1 key-value pairs. For

each, the key is one of the sets { u, v }, where u is the group to which picture i belongs, and

v is one of the other groups. The associated value is the pair ( i, P i ).

The Reduce Function Consider the key { u, v }. The associated value list will have the 2 ×

10 6 /g elements ( j, P j ), where j belongs either to group u or group v . The Reduce function

takes each ( i, P i ) and ( j, P j ) on this list, where i and j belong to different groups, and applies

the similarity function s ( P i , P j ). In addition, we need to compare the pictures that belong to

the same group, but we don't want to do the same comparison at each of the g − 1 reducers

whose key contains a given group number. There are many ways of handling this problem,

but one is as follows. Compare the members of group u at the reducer { u, u +1}, where the

“+1” is taken in the end-around sense. That is, if u = g (i.e., u is the last group), then u + 1

is group 1. Otherwise, u + 1 is the group whose number is one greater than u .

We can compute the replication rate and reducer size as a function of the number of

groups g . Each input element is turned into g − 1 key-value pairs. That is, the replication

rate is g − 1, or approximately r = g , since we suppose that the number of groups is still

fairly large. The reducer size is 2 × 10 6 /g , since that is the number of values on the list for

each reducer. Each value is about a megabyte, so the number of bytes needed to store the

input is 2 × 10 12 /g .

EXAMPLE 2.12 If g is 1000, then the input consumes about 2GB. That's enough to hold

everything in a typical main memory. Moreover, the total number of bytes communicated

is now 10 6 × 999 × 10 6 , or about 10 15 bytes. While that is still a huge amount of data to

communicate, it is 1000 times less than that of the obvious algorithm. Moreover, there are

still about half a million reducers. Since we are unlikely to have available that many com-

pute nodes, we can divide all the reducers into a smaller number of Reduce tasks and still

keep all the compute nodes busy; i.e., we can get as much parallelism as our computing

cluster offers us.

□

Search WWH ::

Custom Search

Home