Geoscience Reference
In-Depth Information
This suggests there is some need to consider how the widespread adoption of reproducible
research practice in GC could be achieved. The previous sections mainly consider this in terms
of data processing software and document production tools - and while these are essential, it
is equally important that there is infrastructure and conventions to support these practices. For
example, at the moment, few journals that publish GC work insist on the provision of code or
data when articles are submitted. Similarly, several would not accept an article submitted in the
Rnw format discussed in this chapter and would instead prefer more commonly used formats
such as LATEX or Microsoft Word files. However, in other disciplines, journals are more aware
of reproducible research. For example, the Biometric Journal now has an associate editor for
reproducible research (Hothorn et al., 2009) and a number of other journals have adopted simi-
lar policies, for example, Biostatistics (Peng, 2009). The American Journal of Epidemiology
strongly encourages potential authors to provide code and data where appropriate (Peng et al.,
2006), and Annals of Internal Medicine also implements policies to encourage reproducibility
(Laine et al., 2007). Biostatistics has also adopted a kite-marking system where papers are
marked as D if the data on which they are based are openly available, C if the code is openly
available and R if both are available. In the last instance, papers are checked for reproducibility
and only awarded this grade if that is the case. The journal publishes all received code and data
electronically as supplementary materials.
One step towards building a culture of reproducible research in GC might be for journals that fre-
quently publish GC research to adopt similar policies. Note that this could be achieved via Sweave
and Stangle either manually or automatically - Sweave can be used to create a reproducible
document - the author could then use the two tools to extract respectively the LATEX file to create
the article and the code used (possibly also containing data) and supply these separately; or at the
publication side, Stangle could be automatically run on an uploaded Rnw file providing the code,
and this in turn could be placed on the journal's website for download.
An alternative strategy is based on the use of an independent resource to provide code and data.
An example of this is the Run My Code website.* Here, researchers are encouraged to upload
their data and code to a personal companion website associated with a publication. On this site,
it is possible to download both code and data, and in some cases, it is possible to run the code on
a remote computer using either the supplied data or new input supplied by the viewer of the web
page. An option is provided where the contributor(s) identity is not revealed, so that a link to the
page can be given in an article submitted for anonymous review. The main Run My Code site is
essentially a portal where users can search for articles by using keywords or browsing categories.
The focus of the site is on financial and economic research - although essentially, this is only
reflected in the browsing categories used to tag the user-contributed companion sites. However,
such a site for GC could easily be created by mirroring the functionality of this site but with a more
appropriate set of categories.
17.6 CONCLUSION
This chapter has set out to outline the basic principles of, and justification for, reproducible research
and its implications for GC. In terms of justification - and possibly basic principles also - this is
perhaps best summarised by the following:
The idea is: An article about computational science in a scientific publication is not the scholarship itself,
it is merely advertising of the scholarship. The actual scholarship is the complete software development
environment and the complete set of instructions which generated the figures (Donoho et al., 2011).
* http://www.runmycode.org/CompanionSite/home.do.
For example if the code is written in R or MATLAB ® .
Search WWH ::




Custom Search