Database Reference
In-Depth Information
the adjacency matrix representation of a graph, and therefore we are going to assume
in the following that
m
=
n
, unless otherwise noticed.
There are three types of operations in the previous formula:
1.
combine2
: multiply
m
i,j
and
v
j
.
2.
combineAll
: sum
n
multiplication results for node
i
.
3.
assign
: overwrite the previous value of
v
i
with the new result to make
′
v
i
.
We introduce an abstraction of the basic matrix-vector multiplication, called
generalized iterative matrix-vector multiplication. The corresponding programming
primitive is the
GIM-V
primitive on which PEGASUS is based. The “Iterative” in
GIM-V
denotes that we apply the ×
G
operation until a convergence criterion is met.
Specifically, let us define the operator ×
G
as follows:
v
′ =
M
×
G
v
where
′
v
i
=
assign
(
v
i
,
combineAll
i
({
x
j
|
j
= 1..
n
, and
x
j
=
combine2
(
m
i,j
,
v
j
)})).
The functions
combine2
(),
combineAll
(), and
assign
() have the follow-
ing interpretation, generalizing the product, sum, and assignment of the traditional
matrix-vector multiplication:
1.
combine2
(
m
i,j
,
v
j
) : combine
m
i,j
and
v
j
.
2.
combineAll
i
(
x
1
,…,
x
n
) : combine all the results from
combine2
() for
node
i
.
3.
assign
(
v
i
,
v
new
) : decide how to update
v
i
with
v
new
.
In the following sections we show how different choices of
combine2
(),
combineAll
i
(), and
assign
() allow us to solve several important graph mining tasks.
Before that, we want to highlight the strong connection of
GIM-V
with SQL. When
combineAll
i
() and
assign
() can be implemented by user defined functions, the
operator ×
G
can be expressed concisely in terms of SQL. This viewpoint is important
when we implement
GIM-V
in large-scale parallel processing platforms, including
Hadoop, if they can be customized to support several SQL primitives including JOIN
and GROUP BY. Suppose we have an
edge
table
E
(
sid, did, val
) and a
vector
table
V(id, val)
, corresponding to a matrix and a vector, respectively. Then, ×
G
corresponds to the SQL statement in Table 8.1. We assume that we have (built-in or
user-defined) functions,
combineAll
i
() and
combine2
(), and we also assume that
the resulting table/vector will be fed into the
assign
() function (omitted, for clarity).
TABLE 8.1
GIM-V
in Terms of SQL
SELECT E.sid,
combineAll
E.sid
(
combine2
(E.val,V.val))
FROM E, V
WHERE E.did=V.id
GROUP BY E.sid
Search WWH ::
Custom Search