Graphics Reference
In-Depth Information
Already, at this point, with this trivial data set, you can identify some
interesting graph properties by sorting these lists. In this trivial example,
the node that occurs most frequently is Ben, and the most frequent link is
Ben-Zoe.
Another interesting property is the number of nodes and number of links.
With four nodes and four links this is not a fully connected data set. A
fully connected data set—meaning every possible link exists—would have
16 links. At four links and four nodes, it is certain that this cannot be
a hierarchy either—a single hierarchy always has one less link than the
number of nodes.
With the 10,000 e-mail data set, following are some of the interesting
properties:
• There are 2,500 nodes. With 10,000 e-mails, this means that each
e-mail is not to a different person, so some people will occur multiple
times.
• There are 9,600 links, significantly less than 2,500 × 2,500 possible
links (that is, a fully connected 2,500-node data set would have more
than 6 million links). The ratio of the number of actual links to the
maximum number of links is called graph density , and if the graph
density is low, the graph is considered a sparse graph.
• The node with the highest count is Michael Johnson with 2,271 e-mails.
Michael is the head of sales in this data set—he Cc's or is Cc'd by many
people because he must coordinate between sales, marketing, technical,
and executive staff.
Note
Graph statistics will be discussed in more detail at the beginning of
Chapter 4, “Stats and Layout.”
When processing the initial data, it may be useful to filter out some of the
data at this early stage. For example, using an e-mail data set extracted
from one person's e-mail inbox (say, Richard's e-mail) means that every
single e-mail will have Richard in either the To, Cc, or Bcc fields. Later,
when visualizing this data, every single link to Richard will then be drawn
in addition to all the other links, thus creating a potentially cluttered view.
Because it is already known that Richard is the source of the e-mail data,
Search WWH ::




Custom Search