Graphics Reference
In-Depth Information
Figure 9-9: This shows the three different product subgraphs from within
the same e-mail graph. The three largest nodes (upper management) are
highlighted in each graph to facilitate visual comparison between graphs.
When you visually inspect these three subgraphs, you see that the three
largest nodes are involved in all products—Miller, Williams, and Garcia
are upper management. The blue product is the largest subgraph, which
indicates that conversations tend to be broader, bringing in more people
(that is, more connections) into conversations. For example, a large cluster
on the left side represents a technology customer for that product, as well
as technology-oriented discussions. The purple product contains a number
of thin lines moving out. The furthest nodes out are customers, and
conversations are very focused on only a person or two. The brownish-red
product is quite small in terms of people and contains an interesting cluster
above the management, which does not appear in the other two products.
These nodes represent a distributor who is largely responsible for selling
this product.
All the approaches discussed so far require using graph software that can
handle many links between nodes. The rest of this chapter shows how
multiple links between nodes can be handled by software that is limited to
only one undirected link or a pair of directed links between nodes. This will
be accomplished by transforming links into nodes.
Actors and Movies
Anothermeansyoucanusetoanalyzelarger,complex,multilinkgraphsisto
transform links into nodes. In this example, consider the Kevin Bacon game
(described in Chapter 4, “Stats and Layout”), where actors are connected to
other actors by movies in which they have both been co-stars.
Wikipedia contains data on films including stars in each film. Wikipedia's
metadata is organized and accessible via http://dbpedia.org , where queries
can be made interactively using SPARQL, a query language for databases in
the Resource Description Format (RDF). A sample DBpedia query is shown
in Chapter 8, “Lightweight Programming” and discussed in more detail in
Chapter 14, “Big Data.” This example is based on a DBpedia query that
extracted a dataset of 20,000 movies and 21,000 actors on Wikipedia. The
raw data is a list of links, movies, and actors, as shown here:
 
Search WWH ::




Custom Search