Graphics Reference
In-Depth Information
or suffixes (for example, “John Doe (Email)”). These need to be
consolidated into a single record.
Duplicate nodes —Within the node data set, each node should appear
only once. For example, “Zoe Jones” should occur only one time. If
multiple “Zoe Jones” occur in the data and all refer to the same Zoe
Jones, these should be aggregated into a single record. If two different
Zoe Jones are employed, then the node should be identified with a
unique identifier (for example, an e-mail address or employee number).
Duplicate links —Some types of graph visualization and analysis
software do not work well with many links between the same pair of
nodes, and these must be consolidated. It is quite common to have
many links in the data between the same pair of nodes based on
additional attributes. For example, in the Flight_Stats data set
provided in the Supplementary Material on this topic's companion
website, there may be multiple flights on a given day between a pair of
cities at different times, on different airlines. If the objective is to
understand the number of flights between each city pair, these must be
consolidated down to a single link for that city pair. Alternatively, if the
objective is to analyze each of the different carrier networks, the
different links must be maintained, and the analysis tools chosen must
handle multiple links between points.
Self-loop —A node that has a link that connects to itself is a self-loop .
In the third e-mail of the previous example, Tim has sent an e-mail to
Ben and Zoe, but also Cc'd himself, thus creating a self-loop. Self-loops
may not be relevant to the analytic objectives. Self-loops are not
handled in some graph software.
Isolated nodes —In the final e-mail shown previously, no From or Cc
is identified. It is feasible to have nodes in data sets to which no links
exist—on some occasions graph programs may have problems with
unlinked nodes.
Links pointing to nonexistent nodes —Although this does not
occur in the previous example, in some data sets, a link may be defined
between two nodes, where one of the nodes does not exist in the list of
nodes. This may cause problems with some graph software.
Invalid data —Unfortunately, real-world data consists of fields that
may be empty, NULL , or may otherwise have invalid data. A column of
Search WWH ::




Custom Search