Geoscience Reference
In-Depth Information
most part poorly understood and poorly supported; this is equally true for computational, statistical
and visual approaches. The techniques used, the results produced and the validation and interpreta-
tion of these results are very important stages in the science process (e.g. Leedy and Ormrod, 2010),
but not necessarily supported well in the systems we currently use. On a more positive note, there
are already signs that visualisation is starting to address some of these challenges in an effort to
move towards defensible and repeatable science (e.g. Thomas and Cook, 2005, 2006). The following
list represents some of the challenges that GeoViz faces as it moves from being an ad hoc process
with limited support to a well-defined process with tools to back up each stage. The list is written to
emphasise the parallels between visual and analytical approaches to GC:
1. Have the right data . Restricting the data attributes imported can help a great deal in limit-
ing the amount of visual comparison and search that is required. But at a deeper level, there
is a huge and often unacknowledged role here for the expert geographer, to ensure that the
datasets used do indeed contain the potential to support useful discoveries. In situations
where explanations are being sought for some pattern, this equates to having a good work-
ing knowledge of likely causal factors. It is obvious, but you can only discover relationships
between data attributes that you have collected and included in the analysis!
2. Build a useful hypothesis space (search space) in which the likely useful findings are
there to be discovered. A hypothesis space is a conceptual idea, to describe the range of
possibilities that are open to the researcher using a specific set of methods. It essentially
puts boundaries around all the possible ways a search can be constructed to make a dis-
covery. The term is used extensively in the machine learning community when describ-
ing the bounds of a search for a local minimum when configuring some kind of learning
algorithm, but the same logic also applies here. Each different visualisation produced can
be considered a hypothesis in the loose sense that some combination of data, symbols,
visual variables and encoding strategy may lead to the recognition of some interesting
artefact in the data. The set of all these possible visualisations defines the total space of
discovery that could be explored. The hypothesis space is not simply a by-product of the
data chosen; it is also constrained by the choices the user makes when selecting what to
visualise and how.
3. Adopt a structured approach to exploration that searches the hypothesis space in a
describable or predictable manner. The visual equivalent of a statistical Type II error
is to miss an interesting artefact in the data because the data were never graphed in a
way that would reveal it. A structured approach may be systematic or pragmatic, but it is
important to know which and thus how much trust to place in the result in terms of errors
of omission. Systematic techniques include projection pursuit and grand tour methods
(Asimov, 1985; Cook et al . , 1995) that take a predictable path through the hypothesis
space by performing dimensional reduction and projection on the data in an iterative,
automated manner. Pragmatic approaches tend to follow an initial hypothesis or interest
that a researcher has and then refine it. That might be practical given the vastness of the
hypothesis space but can lead to large areas of this space being unexamined, and thus
potentially interesting or important discoveries might be missed. Most machine learning
approaches use stochastic search methods that employ useful metrics such as gradient
ascent (e.g. Noll, 1967) to help guide the direction and rate of change from one hypothesis
to the next. Though not without its own risks, it can be helpful to think of visual searches
in the same manner, as one of progressive refinement until there is insufficient improve-
ment in the results (c.f. information gain, Mitchell, 1997) to make further refinement
worthwhile.
4. Avoid what is already known. The strongest correlations and associations in a dataset will
be known already in almost all cases. Yet without care to avoid them, these signals will
tend to dominate the visual display - they will produce the strongest clusters or trends.
Search WWH ::




Custom Search