MapReduce and the New Software Stack - Mining of Massive Datasets

Database Reference

In-Depth Information

5 However, we show in Section 2.6.7 that two passes of MapReduce are usually better than one for matrix multiplica-

tion.

6 This algorithm uses much too much communication, but it will serve as a simple example of the Pregel computation

model.

7 Recall that this figure represented functions, not tasks. As a network of tasks, there would be, for example, many tasks

implementing function f , each of which feeds data to each of the tasks for function g and each of the tasks for function

i .

8 This person, or more generally, people with large extended circles of friends, are good people to use to start a market-

ing campaign by giving them free samples.

9 In a typical cluster, there are many switches connecting subsets of the compute nodes, so all the data does not need to

go across a single gigabit switch. However, the total available communication is still small enough that it is not feas-

ible to implement this algorithm for the scale of data we have hypothesized.

10 If q is less than 2 n , then a reducer cannot get even one row and one column, and therefore cannot compute any out-

puts at all.

11 Bit strings have Hamming distance 1 if they differ in exactly one bit position. You may look ahead to Section 3.5.6

for the general definition.

Search WWH ::

Custom Search

Home