Database Reference
In-Depth Information
Fig. 5. A graph of graphs. Each node represents an entire subgraph — in this case,
a graph of sites linked by a metal attribute. This graph of graphs indicates that the
spatial distributions of copper, lead, iron, and tin are similar, and different to those of
nickel, chromium, and the other metals.
supporting the notion that the differences in the benthic species assemblages of
these bays is related to heavy metal contamination.
Finally, we use a graph of graphs to explore the similarities between the spatial
patterns of the various heavy metals. We generated 11 graphs, one for each metal,
using sites as entities and the metal as attribute data. The pairwise similarities
between each of these graphs were calculated. Fig. 5 shows the resultant graph,
in which each node represents an entire site-metal graph, and the edges indicate
the similarities between those graphs. The graph suggests that copper, lead, iron,
and tin are distributed similarly, and that their distribution is different to that
of nickel, chromium, and the other metals. This was confirmed by inspecting
histograms of metal values at each location: values of copper, lead, iron, and tin
were higher at one of the Brown Bay locations (the one closest to the tip) than
the other, whereas the remaining metals showed similar levels at each of the two
Brown Bay locations.
4
Discussion
Graphs have been previously been recognised for their value in data mining and
exploratory analyses. However, existing software tools for such analyses (that
we were aware of) did not meet our requirements. We have outlined a prototype
web-based tool that builds graph structures from data contained in databases
or files, and presents the graphs for visual exploration or algorithmic analysis.
The construction phase requires the user to define the variables that will be
used to form the graph nodes. While there may be certain definitions that are
logical or intuitive in the context of a particular database (for example, it is
probably intuitive to think of species as nodes when exploring a database of
wildlife observations), the nodes can in fact be an arbitrary combination of any
of the available variables. This is a powerful avenue for interaction and flexibility,
as allows the user to interpret the data from a variety of viewpoints, a key to
successful data mining.
Our interest in graph-based data mining is focused on relatively small graphs
(tens to hundreds of nodes). This is somewhat unusual for graph-based data
mining, which often looks to accomodate graphs of thousands or even millions
Search WWH ::




Custom Search