Visualisation and Exploration of Scientific Data Using Graphs - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

2. Able to access and integrate data from a number of sources. Data of interest

typically fall into one of three categories:

- databases within the AADC (e.g. biodiversity, automatic weather sta-

tions, and state of the environment reporting databases). These

databases are developed and maintained by the AADC, and so have

a consistent structure and are directly accessible.

- flat data files (including external remote sensed environmental data such

as sea ice concentration [8], data collected and held by individual scien-

tists, and data files held in the AADC that have not yet been migrated

into actively-maintained databases).

- web-accessible (external) databases. Several initiatives are under way

that will enable scientists to share data across the web (e.g. GBIF [9]).

3. Be web browser-based. A browser-based solution would allow the tool to be

integrated with the AADC's existing web pages, and thus allow clients to

explore the data sets before downloading. It would also allow any bandwidth-

intensive activities to be carried out at the server end, an important consid-

eration for scientists on Antarctic bases wishing to use the tool.

4. Have an intuitive graphical interface (suitable for a general audience) that

would also provide sucient flexibility for more advanced users (expected to

be mostly internal scientists).

5. Integrated with the existing AADC database structure. To allow the interface

to be as simple as possible, we needed to make use of the existing data

structures and environments in the AADC. For example, the AADC keeps a

data dictionary, which provides limited semantic information about AADC

data, including the measurement scale type (nominal, ordinal, interval, or

ratio) of a variable. This information would allow the application to make

informed processing decisions (such as which dissimilarity metric or measure

of central tendency to use for a particular variable) and thus minimise the

complexity of the interface.

A large number of software packages and algorithms for graph-based data

visualisation have been published, and a summary of a selection of graph software

is presented in Table 1 (an exhaustive review of all available graph software is

beyond the scope of this paper). Existing software that we were aware of met

some but not all of our requirements. The key feature that seemed to be missing

from available packages was the ability to construct a graph directly from a

data source (i.e. to create a graph that provides a graphical portrayal of the

information contained in a data source). Two notable exceptions are GGobi

[10] and Zoomgraph [11]. However, GGobi is intended as a general-purpose data

visualisation, and has relatively limited support for structured (nodes and edges)

graphs. Zoomgraph's graph construction is driven by scripting commands. For

our general audience, we desired that the graph construction be driven by a

graphical interface, and not require the user to have any knowledge of scripting

or database (e.g. SQL) commands.

This paper describes a prototype tool that implements the requirements listed

above. The key novelty of this tool is the ability to rapidly generate a graph

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home