Geoscience Reference
In-Depth Information
This certainly calls into question any results from simulations making use of this generator.
However, without sight of the source code, it is impossible to see how it deviates from the published
Wichmann and Hill algorithm or how this could be remedied. Although the main focus of this
article is on the steps that may be taken by the authors of articles to ensure reproducibility rather
than on the role of open source code, this example demonstrates why this is also of importance.
17.4.3 a lternatiVeS to latex and r
The final issue listed is the use of alternative formats for either the text or code in literate documents.
This is of key practical importance for the geographical information (GI) community - many of
whom do not typically use either LATEX or R. If reproducibility is as fundamental a concept as
is being argued here, then ideally, an inclusive approach should be adopted - simply insisting that
all GI practitioners learn new software is unlikely to encourage uptake. On the other hand, it is an
inconvenient truth that many existing GI manipulation and analysis tools simply do not facilitate
reproducible research for a number of reasons. One path forward might be to identify some ways
of reaching a compromise between the previous two conflicting statements. A tentative list of sug-
gestions follows.
Embedding alternative data processing languages : A further literate programming tool
StatWeave (Lenth, 2009) offers more flexibility than Sweave , as it allows a number of differ-
ent programming languages and statistics packages to be embedded into the LATEX markup -
including SAS, Maple, Stata and flavours of Unix shells. Unfortunately, at this stage, there are no
explicit GI processing applications embedded in this way, although StatWeave does offer the
facility to incorporate new engines into its portfolio, so that other command-line-based software
tools or programming languages can be used. A number of possibilities exist here - for example,
incorporating Python would allow much of the functionality of either ArcGIS or QGIS to be incor-
porated. One such implementation using Python is Pweave ( (Pastell, l, 2011).
Using word processing applications instead of LATEX : For some GI users, a bigger barrier to
adopting either Sweave or StatWeave is the use of LATEX as a typesetting tool, where they may
have had more experience with word processors. The difficulty is perhaps a move away from a GUI-
based cut-and-paste approach to producing documents. Unfortunately, as argued earlier, workflows
that involve analysing spatial data in a specialist package and then cutting and pasting results into
a word processing package are particularly prone to irreproducibility. One practical starting point
might be to adopt a compromise strategy, where a word processing package is used to write the
publication document, but a command-based approach is used to analyse the data.
To achieve this, it is possible to use StatWeave to process . odf files (the XML-based format
for OpenOffice files) with embedded code, and there is an R package odfweave offering the same
functionality (provided the embedded language is R). In both cases, the embedded code is typed
directly into a document, which is then saved and post-processed to replace the embedded code
with the output that it generates in a new . odf file. OpenOffice is then capable of saving the files
into doc or. docx formats, although obviously it is essential to distribute the original. odf iles
with embedded code if the documentation is to be reproducible. A commercial alternative is the
Inference package (Blue Reference, Inc., 2011) which allows a number of languages (including R) to
be embedded into Microsoft Word documents.
Using menu-driven applications instead of R : A final goal here is perhaps to facilitate reproduc-
ible research when working with GUI-based analysis tools as well as word processing applications.
At present, this is likely to be the most difficult of the barriers to overcome, as the use of such tools
implies the cutting and pasting of tables and graphs between applications - divorcing the results
from the steps taken to obtain them. One potential way of maintaining a link between the two appli-
cations is through the joint use of journalling and metadata provision .
Journalling in an application occurs when all of the operations carried out are logged in a textual
format. Effectively, although the user may be using menus and buttons to carry out a data analysis,
Search WWH ::




Custom Search