Graphics Reference
In-Depth Information
Compute the Graph
Transformingrawdataintoasetofnodesandasetoflinkstypicallyrequires
some computation. You can do this via programming, or sometimes
spreadsheet formulas may be sufficient (see the sample spreadsheets on the
accompanying website).
Following the e-mail example, the raw data was accessed via cut and paste
from Outlook to Excel. Transforming the raw e-mail data into a set of nodes
and links required some programming, which will be shown in detail in
Chapter 8. Essentially, for the e-mail data set, the process looked like this
(for each row in the data):
1. Extract each unique node. For example, for the first e-mail, the nodes
are Ben and Zoe.
2. Add these nodes to the node list and set the count (number of e-mails)
to 1. If a node already exists in the node list, instead increment the
count for that node by 1.
3. Each unique pair of nodes within the row is a link. In the second e-mail,
the nodes are Ben, Zoe, and Tim. The unique pairs are Ben-Zoe,
Ben-Tim, and Tim-Zoe. Each of these links must be added to the link
list with a count of 1. If the link already exists, then instead increment
the count for that node by 1.
Note
When processing links, if the links are not directed, then Tim-Zoe and
Zoe-Tim represent the same link, and only one of these pairs should be
in the output link list. Alternatively, if the links are directed, then
Tim-Zoe represents a link from Tim to Zoe, whereas Zoe-Tim
represents a different link from Zoe to Tim—and both pairs can exist in
the output link list.
The results of this computation are two data sets—a set of nodes and a
set of links—exactly the output desired. Although many of the examples
provided in the supplementary data are small (that is, less than 10,000
nodes),youcantakethesameapproachwithmuchlargerdatasets.Fordata
setswithmillionstobillionsofnodes,theapproachcanbeextendedtousing
optimized processes, graph databases, and distributed computing.
Search WWH ::




Custom Search