Relationships - Graph Analysis and Visualization

Graphics Reference

In-Depth Information

Figure 9-9: This shows the three different product subgraphs from within

the same e-mail graph. The three largest nodes (upper management) are

highlighted in each graph to facilitate visual comparison between graphs.

When you visually inspect these three subgraphs, you see that the three

largest nodes are involved in all products—Miller, Williams, and Garcia

are upper management. The blue product is the largest subgraph, which

indicates that conversations tend to be broader, bringing in more people

(that is, more connections) into conversations. For example, a large cluster

on the left side represents a technology customer for that product, as well

as technology-oriented discussions. The purple product contains a number

of thin lines moving out. The furthest nodes out are customers, and

conversations are very focused on only a person or two. The brownish-red

product is quite small in terms of people and contains an interesting cluster

above the management, which does not appear in the other two products.

These nodes represent a distributor who is largely responsible for selling

this product.

All the approaches discussed so far require using graph software that can

handle many links between nodes. The rest of this chapter shows how

multiple links between nodes can be handled by software that is limited to

only one undirected link or a pair of directed links between nodes. This will

be accomplished by transforming links into nodes.

Actors and Movies

Anothermeansyoucanusetoanalyzelarger,complex,multilinkgraphsisto

transform links into nodes. In this example, consider the Kevin Bacon game

(described in Chapter 4, “Stats and Layout”), where actors are connected to

other actors by movies in which they have both been co-stars.

Wikipedia contains data on films including stars in each film. Wikipedia's

metadata is organized and accessible via http://dbpedia.org , where queries

can be made interactively using SPARQL, a query language for databases in

the Resource Description Format (RDF). A sample DBpedia query is shown

in Chapter 8, “Lightweight Programming” and discussed in more detail in

Chapter 14, “Big Data.” This example is based on a DBpedia query that

extracted a dataset of 20,000 movies and 21,000 actors on Wikipedia. The

raw data is a list of links, movies, and actors, as shown here:

Search WWH ::

Custom Search

Home