Geoscience Reference
In-Depth Information
numbers of children under 1 year old mentioned earlier, and the zone boundaries are obtained from
the shapefile also reported earlier. The surface approximation is designed to be smooth in terms of
minimising roughness R defined by Equation 17.1:
2
2
2
S
+
S
y
R
=
dA
(17.1)
2
2
x
A
where
A is the geographical region under study
S is the density, a function of location ( x , y ), subject to the constraints that the integrated population
totals over the supplied set of zones agree with the empirical counts provided
In practice, this is achieved using a finite element approximation, so that population counts for small
rectangular pixels are computed. These calculations are carried out using the R package pycno
(Brunsdon, 2011).
In the example, graphical output from the code is used once again, although this time, a 3D surface
is shown rather than a conventional map.
17.4 IMPLICATIONS FOR GC
The aforementioned examples demonstrate the incorporation of documentation, analysis and data
retrieval in a single document. However, the aforementioned examples are relatively simple and
an important question here is how practical this approach may be in a more general GC context.
A number of issues arise:
1. GC data sets can sometimes be large or in other ways impractical to reproduce.
2. In GC, times required to run algorithms can be large.
3. For some, the command-based format of R and LATEX is perceived as difficult to learn -
can combinations of text and code be expressed in other formats?
The first two of these are perhaps harder to address. In the following sections, each of these issues
will be addressed.
17.4.1 d ealing with l arge and c oMPlex d ata S etS
In many GC problems, the idea of data-driven modelling (Solomatine et al., 2008) is used. Typically
applied in machine-learning contexts, a key idea is that functional relationships between variables
in a (typically fairly large) data set are learned using machine-learning algorithms without prior
theoretical knowledge of the process or system from which the data were drawn. Clearly, to assess
findings from analyses carried out in this way, a thorough knowledge of the data and its lineage are
extremely important. In particular, if the raw data have been cleaned in some way (e.g. by removing
some observations thought to be unreliable), then it is important to ensure that such modifications
are openly recorded and made available, if results are to be reproduced. These ideas are well illus-
trated in a recent paper by Abrahart et al. (2010), who outline a checklist for providing details of
data sets used in publications and go on to note that
… the detail afforded to descriptions of the data sets which drive the models is often lacking, and
replication of results becomes impossible. Consequently, those who should arguably be the great-
est proponents of the data-driven modelling paradigm are failing to properly address large parts of
its requirements.
Search WWH ::




Custom Search