Database Reference
In-Depth Information
of nodes. Our focus on small graphs is driven by our application to Antarctic
scientific data. Such data are extremely costly to acquire and so many of the
data sets that are of interest to us are of relatively small size (generally, tens
to thousands of observations). Our goal is to obtain maximum insight into the
information provided by these data. This is facilitated by the ability to rapidly
generate a number of graphs and interpret a given dataset from a variety of
viewpoints,as noted above. Furthermore, the visualisation tool that we have
chosen to use provides a high degree of interactivity in terms of the layout of
the graph, which further enhances the user's insight into the data. However,
this visualisation tool is best suited to relatively small graphs, as the dynamic
layout algorithm becomes too slow for more than about a hundred nodes on a
standard PC. Other visualisation tools, specifically designed for large graphs (e.g.
[19, 26, 27]) might be useful for visualising such graphs. FADE [19] and MGV [26]
use hierarchical views that can range from global structure of a graph with little
local detail, through to local views with full detail. We note that the constraint
on graph size lies with the visualisation tool and not the algorithm that we use
to generate the graph from the underlying data. We have successfully used our
graph generation procedures on a database of wildlife observations comprising
approximately 150000 observations of 30 variables — quite a large data set by
Antarctic scientific standards!
One of the notable limitations of our current implementation is the requirement
that attribute data be discrete. (Edges are only formed between nodes that have an
exact match in one or more attributes). Continuous attributes must be discretised,
which is both wasteful of information and can lead to different graph structures
with different choices of discretisation method. Discretisation is potentially par-
ticularly problematic for Antarctic scientific data sets, which tend not only to be
relatively small but also sparse. Sparsity will lead to few exact matches in discre-
tised data, and to graphs that may have too few edges to convey useful information.
Future development will therefore focus on continuous attribute data.
Many other packages for graph-based data exploration exist, and we have
incorporated the features of some of these into our design. The GGobi pack-
age [10] has a plugin that allows users to work directly with databases. GGobi
also ties into the open-source statistical package R to provide graph algorithms.
Zoomgraph [11] takes the same approach. This is one method of providing graph
algorithms without the cost of re-implementation. Another is simply to pass the
graph to the user, who can then use one of the many freely-available graph soft-
ware packages (e.g. [28, 29, 30, 31]). Yet another approach, which we are currently
investigating, is the use of analytical web services. Our development has been
done in Coldfusion, which can make use of Java and can also expose any function
as a web service. This may allow us to deploy functions from an existing Java
graph library such as Jung [31] as a set of web services. This approach would
have the advantage that external users could also make use of the algorithms,
by passing their GXL files via web service calls.
The software discussed in this paper is available from http://aadc-maps.
aad.gov.au/analysis/gb.cfm .
 
Search WWH ::




Custom Search