Visualisation and Exploration of Scientific Data Using Graphs - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

Fig. 5. A graph of graphs. Each node represents an entire subgraph — in this case,

a graph of sites linked by a metal attribute. This graph of graphs indicates that the

spatial distributions of copper, lead, iron, and tin are similar, and different to those of

nickel, chromium, and the other metals.

supporting the notion that the differences in the benthic species assemblages of

these bays is related to heavy metal contamination.

Finally, we use a graph of graphs to explore the similarities between the spatial

patterns of the various heavy metals. We generated 11 graphs, one for each metal,

using sites as entities and the metal as attribute data. The pairwise similarities

between each of these graphs were calculated. Fig. 5 shows the resultant graph,

in which each node represents an entire site-metal graph, and the edges indicate

the similarities between those graphs. The graph suggests that copper, lead, iron,

and tin are distributed similarly, and that their distribution is different to that

of nickel, chromium, and the other metals. This was confirmed by inspecting

histograms of metal values at each location: values of copper, lead, iron, and tin

were higher at one of the Brown Bay locations (the one closest to the tip) than

the other, whereas the remaining metals showed similar levels at each of the two

Brown Bay locations.

4

Discussion

Graphs have been previously been recognised for their value in data mining and

exploratory analyses. However, existing software tools for such analyses (that

we were aware of) did not meet our requirements. We have outlined a prototype

web-based tool that builds graph structures from data contained in databases

or files, and presents the graphs for visual exploration or algorithmic analysis.

The construction phase requires the user to define the variables that will be

used to form the graph nodes. While there may be certain definitions that are

logical or intuitive in the context of a particular database (for example, it is

probably intuitive to think of species as nodes when exploring a database of

wildlife observations), the nodes can in fact be an arbitrary combination of any

of the available variables. This is a powerful avenue for interaction and flexibility,

as allows the user to interpret the data from a variety of viewpoints, a key to

successful data mining.

Our interest in graph-based data mining is focused on relatively small graphs

(tens to hundreds of nodes). This is somewhat unusual for graph-based data

mining, which often looks to accomodate graphs of thousands or even millions

Search WWH ::

Custom Search

Home