Visualisation and Exploration of Scientific Data Using Graphs - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

of nodes. Our focus on small graphs is driven by our application to Antarctic

scientific data. Such data are extremely costly to acquire and so many of the

data sets that are of interest to us are of relatively small size (generally, tens

to thousands of observations). Our goal is to obtain maximum insight into the

information provided by these data. This is facilitated by the ability to rapidly

generate a number of graphs and interpret a given dataset from a variety of

viewpoints,as noted above. Furthermore, the visualisation tool that we have

chosen to use provides a high degree of interactivity in terms of the layout of

the graph, which further enhances the user's insight into the data. However,

this visualisation tool is best suited to relatively small graphs, as the dynamic

layout algorithm becomes too slow for more than about a hundred nodes on a

standard PC. Other visualisation tools, specifically designed for large graphs (e.g.

[19, 26, 27]) might be useful for visualising such graphs. FADE [19] and MGV [26]

use hierarchical views that can range from global structure of a graph with little

local detail, through to local views with full detail. We note that the constraint

on graph size lies with the visualisation tool and not the algorithm that we use

to generate the graph from the underlying data. We have successfully used our

graph generation procedures on a database of wildlife observations comprising

approximately 150000 observations of 30 variables — quite a large data set by

Antarctic scientific standards!

One of the notable limitations of our current implementation is the requirement

that attribute data be discrete. (Edges are only formed between nodes that have an

exact match in one or more attributes). Continuous attributes must be discretised,

which is both wasteful of information and can lead to different graph structures

with different choices of discretisation method. Discretisation is potentially par-

ticularly problematic for Antarctic scientific data sets, which tend not only to be

relatively small but also sparse. Sparsity will lead to few exact matches in discre-

tised data, and to graphs that may have too few edges to convey useful information.

Future development will therefore focus on continuous attribute data.

Many other packages for graph-based data exploration exist, and we have

incorporated the features of some of these into our design. The GGobi pack-

age [10] has a plugin that allows users to work directly with databases. GGobi

also ties into the open-source statistical package R to provide graph algorithms.

Zoomgraph [11] takes the same approach. This is one method of providing graph

algorithms without the cost of re-implementation. Another is simply to pass the

graph to the user, who can then use one of the many freely-available graph soft-

ware packages (e.g. [28, 29, 30, 31]). Yet another approach, which we are currently

investigating, is the use of analytical web services. Our development has been

done in Coldfusion, which can make use of Java and can also expose any function

as a web service. This may allow us to deploy functions from an existing Java

graph library such as Jung [31] as a set of web services. This approach would

have the advantage that external users could also make use of the algorithms,

by passing their GXL files via web service calls.

The software discussed in this paper is available from http://aadc-maps.

aad.gov.au/analysis/gb.cfm .

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home