iMapReduce - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

score of v along with the outbound neighbors set is written to DFS for feeding the

next iteration MapReduce job. To stop the iterative process, users have to perform

another MapReduce job after each iteration to measure the difference from the result

of the last iteration.

The map and reduce operations can be described as follows.

Ru

Nu

()

PageRank Map: For each node u , output KVs

vd

,

, where v ∈ N + ( u ),

+

and output the retained ranking score and its outbound neighbors set,

1 −

d

.

+

u

,

Nu

()

V

PageRank Reduce: For each node v , add the retained rank score 1− d

V

and

Ru

Nu

()

the values received from any u d

to update R ( v ), and output KV 〈 v ,

+

[ R ( v ), N + ( v )]〉.

3.1.2.2 K-means

K-means [11] is a commonly used clustering algorithm, which partitions n nodes into

k clusters so that the nodes in the same cluster are more similar than those in other

clusters. We describe the algorithm briefly as follows. (1) Start with selecting k ran-

dom nodes as cluster centroids. (2) Assign each node to the nearest cluster centroid.

(3) Update the k cluster centroids by “averaging” the nodes belonging to the same

cluster centroid. Repeat steps (2) and (3) until convergence has been reached. The

algorithm has converged when the assignments no longer change.

The map and reduce operations for the K-means algorithm can be described as

follows.

K-Means Map: For each key nid (node id), compute the distance from the

node to any cluster centroid, and output the closest cluster id cid along with

the node coordinate information ncoord , that is, output KV 〈 cid , ncoord 〉.

K-Means Reduce: For each key cid (cluster id), update the cluster centroid

coordinate by averaging all the nodes' coordinates that belong to cid , and

output cid along with the cluster's updated centroid coordinate ccoord , that

is, output KV 〈 cid , ncoord 〉.

3.1.2.3 Matrix Power Iteration

Square matrices can be multiplied by themselves repeatedly. This repeated multipli-

k

= ∏ 1 . This iterative

process is called matrix power iteration (MPI). The operation of each iteration is

matrix multiplication, that is, M k = M × N , where N = M k -1 . If M is a matrix with

element m ij in row i and column j , and N is a matrix with element n jk in row j and

k

cation can be described as a power of the matrix, that is, M

M

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home