Database Reference
In-Depth Information
advantages of the data dictionary to be realised for file data. Remote databases
can be accessed using web services. Initially we have provided access only to
GBIF data [9] through the DiGIR protocol. Data from web service sources are
described by XML schema, which can be used in a similar manner to the data
dictionary to provide limited semantic information.
To construct a graph representation of these data, the user must specify which
variables are to be used to form the nodes, and a means of forming edges between
nodes. Nodes are formed from the discrete values (or
-tuples) of one or more
variables in the database. The graphical interface provides a list of available data
sources, and once a data source is selected, a list of all variables provided by
that data source. This information comes from the column names in a user
file or database table, or from the “concepts” list of a DiGIR XML resource
file. Available semantic information is used to decide how to discretise the node
variables. Continuous variables need to be discretised to form individual nodes.
A simple equal-interval binning option is provided for this purpose. Categorical
or ordinal (i.e. discrete) variables need no discretisation, and so this dialogue is
not shown unless necessary.
Once defined, each node is assigned a set of attribute data. These data are
potentially drawn from all other columns in the database. The graphical interface
allows attribute data to be drawn from a different data source provided that the
sources can be joined using a single variable. More complex joins can be achieved
using text commands. Attribute data are used to create the connectivity of the
graph. Nodes that share attribute values are connected by edges, which are
optionally weighted to reflect the strength of the linkage between the nodes. The
application automatically chooses a weighting scheme that is appropriate to the
attribute data type; this choice can be overridden by the user if desired.
Once data sources and variables have been defined, the application parses
the node attributes to create edges, and builds an XML (in fact GXL, [12])
document that describes the graph. The graph can be either visually explored,
or processed with one of many graph-based analytic algorithms.
n
2.2
Graph Visualisation
Graph structures are displayed to the user in an interactive graph browser. The
browser is a modified version of the Touchgraph LinkBrowser [13], which is an
open-source Java tool for graph layout and interaction. Layout is accomplished
using a spring-model method, in which each edge is considered to be a spring,
and the node positions are chosen to minimise the global energy of the spring
system. Nodes also have mutual repulsion in order to avoid overlap in the layout.
While small graphs can reasonably be displayed in their entirety, large graphs
often cannot be displayed in a comprehensible form on limited screen real estate.
We solve this problem by allowing large graphs to be explored as a dynamic
series of smaller graphs (see below). We discuss alternative approaches, such as
hierarchical views with varying level of detail, in the discussion.
Interaction with the user is achieved through three main processes: node se-
lection, neighbourhood adjustment, and edge manipulation. The displayed graph
 
Search WWH ::




Custom Search