Geoscience Reference
In-Depth Information
non-disclosure of computational details is due to a third party rather than the author of the research
and can be resolved if, whenever possible, open source software is used.
However, attention in this discussion is focussed on Situations 3 and 4. Both of these situations
arise if exact details are not made widely available. In most cases, this is not done with malice
aforethought on the part of researchers - few journals insist that such precise details are provided.
Although, in general, researchers must cite the sources of secondary data, such citations often con-
sist of acknowledgement of the agency that supplied this data, possibly with a link to a general web-
site, rather than an explicit link (or links) to a file (or files) that contained the actual data used in the
research. Similarly, the situation described earlier in which computational processes are described
verbally rather than in a more precise algorithmic form is often considered acceptable for publica-
tion. For Situation 3, an alternative is provided via journals such as Earth Systems Science Data *
which provides an open access resource in which data sets and their methods of collection are
described and links to the data itself provided via resources such as Pangaea ® . However, this does
not resolve the issue in Situation 4 which will be focussed on now.
17.2.2 S oftware B arrierS to r eProduciBility
Another source of uncertainty in identifying exact data sources or code used is that this informa-
tion is not necessarily organised by the researchers in a way that enables complete recall. Typically,
this occurs when software used to carry out the analysis is interactive - to carry out an analysis, a
number of menu items were chosen, buttons clicked and so on, before producing a table or a graph
that was cut and pasted into a word processing document. Unfortunately, although interactive soft-
ware is easier to use, its output is less reproducible. Some months after the original analysis, it may
be difficult to recall exactly which options were chosen when the analysis took place. Cutting and
pasting the results into another document essentially broke the link between the analysis itself and
the reporting of that analysis - the final document shows the output but says nothing about how it
was obtained.
In general, unless interactive software has a recording facility, where commands associated with
mouse clicks are saved in some format and can be replayed in order, then GUIs and reproducible
research do not go well together. However, even when analysis is carried out using scripts, repro-
ducibility cannot be guaranteed. For example, on returning to a long-completed project, one may
find a number of script files with similar content, but no information about which one was actually
run to reproduce the reported results, or indeed, whether a combination of chunks of code, that is,
a collection of lines of code used to carry out some part of the analysis, from several different files
was pasted into a command-line interface to obtain reported results.
17.3 LITERATE PROGRAMMING
To address these problems, one approach proposed is that of literate programming (Knuth, 1984).
Originally, this was intended as a means of improving the documentation of programs - a single file
(originally called a WEB file) containing the program documentation and the actual code is used to
generate both a human-readable document (via a program called WEAVE ) and computer-readable
content (via a program called TANGLE ) to be fed to a compiler or interpreter. The original inten-
tion was that the human-readable output provided a description of the design of the program (and
also neatly printed listings of the code), offering a richer explanation of the program's function
than conventional comment statements. However, WEB files could be used in a slightly different
way, where rather than describing the code, the human-readable output reports the results of data
analysis performed by the incorporated code. In this way, information about both the reporting and
the processing can be contained in a single document. In this case, rather than a more traditional
* http://www.earth-system-science-data.net/.
Search WWH ::




Custom Search