Geovisualisation as an Analytical Toolbox for Discovery - GeoComputation

Geoscience Reference

In-Depth Information

most part poorly understood and poorly supported; this is equally true for computational, statistical

and visual approaches. The techniques used, the results produced and the validation and interpreta-

tion of these results are very important stages in the science process (e.g. Leedy and Ormrod, 2010),

but not necessarily supported well in the systems we currently use. On a more positive note, there

are already signs that visualisation is starting to address some of these challenges in an effort to

move towards defensible and repeatable science (e.g. Thomas and Cook, 2005, 2006). The following

list represents some of the challenges that GeoViz faces as it moves from being an ad hoc process

with limited support to a well-defined process with tools to back up each stage. The list is written to

emphasise the parallels between visual and analytical approaches to GC:

1. Have the right data . Restricting the data attributes imported can help a great deal in limit-

ing the amount of visual comparison and search that is required. But at a deeper level, there

is a huge and often unacknowledged role here for the expert geographer, to ensure that the

datasets used do indeed contain the potential to support useful discoveries. In situations

where explanations are being sought for some pattern, this equates to having a good work-

ing knowledge of likely causal factors. It is obvious, but you can only discover relationships

between data attributes that you have collected and included in the analysis!

2. Build a useful hypothesis space (search space) in which the likely useful findings are

there to be discovered. A hypothesis space is a conceptual idea, to describe the range of

possibilities that are open to the researcher using a specific set of methods. It essentially

puts boundaries around all the possible ways a search can be constructed to make a dis-

covery. The term is used extensively in the machine learning community when describ-

ing the bounds of a search for a local minimum when configuring some kind of learning

algorithm, but the same logic also applies here. Each different visualisation produced can

be considered a hypothesis in the loose sense that some combination of data, symbols,

visual variables and encoding strategy may lead to the recognition of some interesting

artefact in the data. The set of all these possible visualisations defines the total space of

discovery that could be explored. The hypothesis space is not simply a by-product of the

data chosen; it is also constrained by the choices the user makes when selecting what to

visualise and how.

3. Adopt a structured approach to exploration that searches the hypothesis space in a

describable or predictable manner. The visual equivalent of a statistical Type II error

is to miss an interesting artefact in the data because the data were never graphed in a

way that would reveal it. A structured approach may be systematic or pragmatic, but it is

important to know which and thus how much trust to place in the result in terms of errors

of omission. Systematic techniques include projection pursuit and grand tour methods

(Asimov, 1985; Cook et al . , 1995) that take a predictable path through the hypothesis

space by performing dimensional reduction and projection on the data in an iterative,

automated manner. Pragmatic approaches tend to follow an initial hypothesis or interest

that a researcher has and then refine it. That might be practical given the vastness of the

hypothesis space but can lead to large areas of this space being unexamined, and thus

potentially interesting or important discoveries might be missed. Most machine learning

approaches use stochastic search methods that employ useful metrics such as gradient

ascent (e.g. Noll, 1967) to help guide the direction and rate of change from one hypothesis

to the next. Though not without its own risks, it can be helpful to think of visual searches

in the same manner, as one of progressive refinement until there is insufficient improve-

ment in the results (c.f. information gain, Mitchell, 1997) to make further refinement

worthwhile.

4. Avoid what is already known. The strongest correlations and associations in a dataset will

be known already in almost all cases. Yet without care to avoid them, these signals will

tend to dominate the visual display - they will produce the strongest clusters or trends.

GeoComputation

Search WWH ::

Custom Search

Home