Database Reference
In-Depth Information
1. combine2 ( m i,j , v j ) = m i,j × v j .
2. combineAlli() i ( x 1 ,…, x n ) = min{ x j | j = 1.. n }.
3. assign ( v i , v new ) = min( v i , v new ).
By repeating this process, component ids of nodes in a component are set to
the minimum node id of the component. We iteratively do the multiplication until
component ids converge. The upper bound of the number of iterations in Hcc is d ,
where d is the diameter of the graph. We notice that because of the small-world
phenomenon, see Section 8.2, the diameter of real graphs is small, and therefore, in
practice, Hcc completes after a small number of iterations. For a recent work with
better practical performance, see [45].
8.4 HADOOP IMPLEMENTATION
Given the main goal of the PEGASUS project is to provide an efficient system to
the user, we discuss different Hadoop implementation approaches, starting out with
a naive implementation and progressing to faster methods for GIM-V . The proposed
versions are evaluated in Section 8.5.
8.4.1 GIM-V base: n aive m ultiPliCation
GIM-V BASE is a two-stage algorithm whose pseudocode is in Algorithms 8.1 and
8.2. The inputs are an edge file and a vector file. Each line of the edge file has
the form ( id src , id dst , mval ), which corresponds to a nonzero entry in the adjacency
matrix. Similarly, each line of the vector file has the form ( id , vval ), which cor-
responds to an element in vector v . Stage1 performs the combine2 operation
by combining columns of matrix ( id dst of M ) with rows of the vector ( id of V ). The
output of Stage1 are (key, value) pairs where the key is the source vertex id of the
matrix ( id src of M ), and the value is the partially combined result ( combine2 ( mval ,
vval )). This output of Stage1 becomes the input of Stage2 . Stage2 combines
all partial results from Stage1 and updates the vector. The combineAlli() i () and
assign () operations are done in line 15 of Stage2 , where the “self” and “oth-
ers” tags in lines 15 and 21 of Stage1 are needed by Stage2 to distinguish cases
appropriately. We note that in Algorithm 8.4 and 8.5, Output( k , v ) means to output
data with the key k and the value.
8.4.2 GIM-V bl: bl: loCk m ultiPliCation
GIM-V BL is a fast algorithm for GIM-V , which is based on block multiplication.
The main idea is to group elements of the input matrix into blocks/submatrices of
size b by b . Also, we group elements of input vectors into blocks of length b . In
practice, grouping means we place all elements of a group into one line of input file.
Each block contains only nonzero elements of the matrix/vector. The format of a
matrix block with k nonzero elements
Search WWH ::




Custom Search