Database Reference
In-Depth Information
both in Mystic River , so they have one degree of separation or a path of length 1. But Benicio
Del Toro has a path of length 2 because he has never been in a movie with Kevin Bacon, but
has been in one with Sean Penn.
You can show these relationships by means of a graph, a set of ordered pairs (N,M) which
describe a connection from N to M.
You can think of a tree (such as a hierarchical filesystem) as a graph with a single source
node or origin, and arcs leading down the tree branches. The set {(top, b1), (top, b2), (b1,c1),
(b1,c2), (b2,c3)} is a tree rooted at top, with branches from top to b1 and b2, b1 to c1 and c2,
and b2 to c3. The elements of the set {top, b1, b2, c1,c2,c3} are called the nodes.
You will find graphs useful in describing relationships between entities. For example, if you
had a collection of emails sent between people in your organization, you could build a graph
where each node represents a person in your organization and an arc would exist between
node a and node b if a sent an email to b or vice versa. It could look like Figure 2-2 .
Figure 2-2. A graph detailing email relationships between people
Giraph is an Apache project to build and extract information from graphs. For example, you
could use Giraph to calculate the shortest distance (number of arc hops) from one node in the
graph to another or to calculate if there was a path between two nodes.
Apache Giraph is derived from a Google project called Pregel and has been used by Face-
book to build and analyze a graph with a trillion nodes, admittedly on a very large Hadoop
cluster. It is built using a technology called Bulk Synchronous Parallel (BSP).
The general notion is that there are a set of “supersteps” in the BSP model. In step zero, the
vertices or nodes are distributed to worker processes. In each following superstep, each of
the vertices iterates through a set of messages it received from the previous superset and
sends messages to other nodes to which it is connected.
In the Kevin Bacon example, each node represents an actor, director, producer, screenwriter,
and so on. Each arc connects two people who are part of the same movie. And we want to
 
Search WWH ::




Custom Search