Geography Reference
In-Depth Information
3.2.2 Complex and diverse datasets
Not only is it useful to view the information through different forms, and gain better
understanding through looking and manipulating the data, but value can be gained by
algorithmically fusing information from assorted databases. For instance, a second database
may provide information that was missing from the first, errors or discrepancies of the data
may be highlighted through the amalgamation of data, and information may be inferred
through the integration of the two datasets.
Furthermore, large and complex geographical datasets often contain missing or erroneous
information. Missing data obviously affects how the user explores the information. For
missing data, the developer needs to work out what to do with it: whether to ignore it
by pre-filtering it out, make an assumption and substitute the information, or classify it
as missing and treat it specifically. If it is going to be included then the data-processing
algorithms may need to be re-written to specifically handle these cases. For instance, what
happens when three values are averaged when one value is null? Likewise, erroneous data
can be deleted or flagged as being potentially wrong, but again these concepts need to be
fully integrated into the system, from data-processing and visualization to interaction and
manipulation. If users are to explore the complete dataset, then it is imperative that the
developer integrates methods that deal with this information. It is far too easy to develop
tools that merely ignore this type of data. Often missing or erroneous information can help
the user infer some fact or it can be used to support a hypothesis. Seminal work in this
area is Unwin et al . (1996) in their MANET (Missings Are Now Equally Treated) system.
However, current systems still do little to integrate this uncertain data and much work needs
to be done here to fully integrate and allow the user to appropriately manipulate missing
information.
3.2.3 Data processing challenges
The sheer size, complexity and diverse nature of geographical datasets definitely have con-
sequences for exploratory analysis. Certainly researchers have developed some clever and
useful algorithms to address many of these issues, but the main limitation is that they are not
commonplace in most of the general purpose geovisualization tools. For example, techniques
such as parallel algorithms or the use of remote high-performance computers have their
benefits, but although much work has been done looking at such techniques individually
and their application to geographical visualization techniques, they are not commonplace
and not included natively in many exploratory geovisualization systems. Data abstraction
techniques also have the desired effect of speeding up the processing of large datasets, and
again a reasonable amount of research has been achieved. However, although visual filtering
methods are in widespread use, there has been little work on tightly integrating data mining
techniques with exploratory visualization techniques.
Furthermore, additional research on the curation of geographical data is required. That
is, both the storage of temporary datasets from an exploration and also details of the oper-
ation history (the commands used in the exploration) should be saved. This would enable
provenance of the exploration and permit additional researchers to reproduce and confirm
any results.
Search WWH ::




Custom Search