Visualisation and Exploration of Scientific Data Using Graphs - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

advantages of the data dictionary to be realised for file data. Remote databases

can be accessed using web services. Initially we have provided access only to

GBIF data [9] through the DiGIR protocol. Data from web service sources are

described by XML schema, which can be used in a similar manner to the data

dictionary to provide limited semantic information.

To construct a graph representation of these data, the user must specify which

variables are to be used to form the nodes, and a means of forming edges between

nodes. Nodes are formed from the discrete values (or

-tuples) of one or more

variables in the database. The graphical interface provides a list of available data

sources, and once a data source is selected, a list of all variables provided by

that data source. This information comes from the column names in a user

file or database table, or from the “concepts” list of a DiGIR XML resource

file. Available semantic information is used to decide how to discretise the node

variables. Continuous variables need to be discretised to form individual nodes.

A simple equal-interval binning option is provided for this purpose. Categorical

or ordinal (i.e. discrete) variables need no discretisation, and so this dialogue is

not shown unless necessary.

Once defined, each node is assigned a set of attribute data. These data are

potentially drawn from all other columns in the database. The graphical interface

allows attribute data to be drawn from a different data source provided that the

sources can be joined using a single variable. More complex joins can be achieved

using text commands. Attribute data are used to create the connectivity of the

graph. Nodes that share attribute values are connected by edges, which are

optionally weighted to reflect the strength of the linkage between the nodes. The

application automatically chooses a weighting scheme that is appropriate to the

attribute data type; this choice can be overridden by the user if desired.

Once data sources and variables have been defined, the application parses

the node attributes to create edges, and builds an XML (in fact GXL, [12])

document that describes the graph. The graph can be either visually explored,

or processed with one of many graph-based analytic algorithms.

n

2.2

Graph Visualisation

Graph structures are displayed to the user in an interactive graph browser. The

browser is a modified version of the Touchgraph LinkBrowser [13], which is an

open-source Java tool for graph layout and interaction. Layout is accomplished

using a spring-model method, in which each edge is considered to be a spring,

and the node positions are chosen to minimise the global energy of the spring

system. Nodes also have mutual repulsion in order to avoid overlap in the layout.

While small graphs can reasonably be displayed in their entirety, large graphs

often cannot be displayed in a comprehensible form on limited screen real estate.

We solve this problem by allowing large graphs to be explored as a dynamic

series of smaller graphs (see below). We discuss alternative approaches, such as

hierarchical views with varying level of detail, in the discussion.

Interaction with the user is achieved through three main processes: node se-

lection, neighbourhood adjustment, and edge manipulation. The displayed graph

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home