Topological acceleration (en | fr) About | Scalar averaging/DE | inhomog@ADS arXiv ADS

blogs archives
en:
Exzuberant
In the dark
Trenches of discovery
Kipac
fr:
Café sciences
Luminet
Libération

Mon, 01 Jul 2019

Science reproducibility: the software evolution problem

What is the point of publishing a scientific paper if an expert reader has to do so much extra work to independently reproduce the results that s/he is effectively discouraged from doing so?

Reproducibility: brief description

In the present practice of cosmology research, such a paper tends to be accepted as "scientific" if the method is described in sufficient detail and clearly enough, and if the observational data are publicly available in the case of an observational paper. However, the modern concepts of free-licensed software and efficient management of software evolution via git repositories over the Internet, as well as Internet communication in general, should make it, in principle, possible to allow an expert reader to reproduce the figures and tables of a research paper with just a small handful of commands in a terminal, to download, compile and run scripts and programs provided by the authors of the research article. This will in practice make it easier for more scientists to verify the method and results, and improve on them, rather than forcing them to rewrite everything from scratch.

This idea has been floating around for several years. A very nice summary and discussion by Mohammad Akhlagi includes Akhlagi's own aim of making the complete research paper reproducible with just a few lines of shell commands, and links to several astronomical reproducible papers from 2012 to 2018, most using complementary methods.

I tend to agree that using Makefiles is most likely to be the optimal overall strategy for reproducible papers. For the moment, I've used a single shell script in 1902.09064.

The software evolution problem

I suspect, unfortunately, that there's a fundamental dilemma in making fully reproducible papers that remain reproducible in the long term, because of software evolution. Akhlagi's approach is to download and compile all the libraries that are needed by the author(s)' software, in specific versions of the software that were used at the time of preparing the research paper. This would appear to solve the software evolution problem.

My approach, at least so far in 1902.09064, is to use the native operating system (Debian GNU/Linux, in my case) recommended versions of all libraries and other software, to the extent that these are available; and to download and compile specific versions of software that are "research-level" software, either not yet available in a standard GNU/Linux family operating system, or evolving too fast to be available in those systems.

Download everything: pro

  • exact reproducibility: Downloading as much software, including various libraries, as possible, with specific commit hashes (frozen versions), should, in principle, enable users at any time later to reproduce exactly what the authors claimed they did to obtain their results. Tracing the origin of differing results should be easier than if these libraries are not downloaded in exactly the same versions.

Download everything: con

  • heaviness: Downloading the source code of libraries (such as the GNU Scientific Library) that are integrated into a well-tested, stable operating system such as Debian GNU/Linux (stable), and recompiling them from scratch, can consume a lot of download bandwidth and a lot of cpu time. If the user wishes to repeat the cycle from scratch many times, this becomes prohibitive in terms of user patience, and in the context of the climate crisis, risks becoming unethical if the benefits are too modest compared to the costs in "carbon load".
  • security risks: Old versions of standard libraries contain science errors, software errors, and security bugs. Reproducing the science errors and software errors of the authors is acceptable, since the aim is to check the authors' claims. But running software with unfixed security flaws is unwise. Running user-space software that is out-of-date in terms of security is less risky than doing so for root-level software, but is still a risk. Once a cracker has obtained user-level remoted access to a computer, escalating to root access is a much smaller challenge than getting initial access to the system.
  • dependency hell: "Science research" software depends on lower level libraries (integrals, derivatives, interpolation, monte carlo methods) that themselves rely on lower level numerical algorithm libraries, that need to be integrated with parallelisation libraries, and that themselves depend on the kernel and other system-level libraries. The FLOSS community is constantly improving and fixing these different software packages, as well as package management systems. Some packages or some of their functions may become unmaintained and obsolete. There is a complex logical graph of dependencies between software packages, the complexity is unlikely to weaken, and the ecosystem of software is not going to stop evolving. A package that can successfully download and compile a few hundred Mb of associated software source codes and correctly run in 2019 might be extremely difficult to run in a standar software environment in 2029 or 2039. The user could be forced to deal with dependency hell in order to check the package from 2019.
  • inexact reproducibility?: Ten or 20 years after a research paper is published, how easy will it really be to provide an operating system and software environment that is really identical to that used by the author? There is such a diversity of the GNU/Linux-like operating systems, that few scientists will be really interested in trying to emulate "antique" operating systems/software environments.

Prefer native libraries: pro

  • efficiency: In by-default binary distributions, such as Debian, the bandwidth and cpu loads for binary versions of libraries are much lighter than for the "download everything" approach.
  • security: Well-established software distribution communities such as Debian, with numerous quality assurance pipelines and methods and bug management, will tend to provide high (though never perfect) standards of software security, as well as correct science and software errors.
  • convenience: There is a complex logical graph of dependencies between software packages, and the complexity is unlikely to weaken. Using native libraries avoids dependency hell.

Prefer native libraries: con

  • faith in modularity: The "prefer native" approach effectively assumes that any bugs or science errors in the research level software lie in the "science level" software and are not the fault of libraries that are stable enough to be "native" in the operating system. But this might not always be the case: the fault might be in the native library, and either have been fixed, or have been introduced, in versions more recent than those used by the research article authors.

Choosing an approach

While the "download everything" approach is, in principle, preferable in terms of hypothetical reproducibility, it risks being heavy, could have security risks, could be difficult due to dependency hell, and might in the long term not lead to exact reproducibility anyway, for practical reasons (leaving aside theoretical Turing machines). The "prefer native libraries" approach provides, in principle, less reproducibility, but it should be more efficient, secure and convenient, and, in practice, may be sufficient to trace bugs and science errors in scientific software.

fr | permanent link | RSS | trackback: ping me (experimental)

Comments: edit name, title and content in this template: NAME: name; TITLE: title; Please publish my comment on https://cosmo.torun.pl/blog/reproducibility ; content; and send the edited template to blog cosmo torun pl; use of email is for antispam filtering only; your email address will not be published.

2019/07
2019/04
2016/08
2016/02
2016/01


content licence: CC-BY | blog tools: GNU/Linux, emacs, perl, blosxom