Mon, 01 Jul 2019
Science reproducibility: the software evolution problem
What is the point of publishing a scientific paper if an expert reader
has to do so much extra work to independently reproduce the results
that s/he is effectively discouraged from doing so?
Reproducibility: brief description
In the present practice of cosmology research, such a paper tends to
be accepted as "scientific" if the method is described in sufficient
detail and clearly enough, and if the observational data are
publicly available in the case of an observational paper. However,
the modern concepts
software and efficient management of software evolution
via git repositories
over the Internet, as well as Internet communication in general,
should make it, in principle, possible to allow an expert reader to
reproduce the figures and tables of a research paper with just
a small handful of commands in a terminal, to download, compile and
run scripts and programs provided by the authors of the research article.
This will in practice make it easier for more scientists to verify
the method and results, and improve on them, rather than forcing them
to rewrite everything from scratch.
This idea has been floating around for several years. A very nice
summary and discussion by Mohammad Akhlagi includes Akhlagi's own aim of making the
complete research paper reproducible with just
a few lines of shell commands, and links to several astronomical
reproducible papers from 2012 to 2018, most using complementary
I tend to agree that using Makefiles is most likely to be
the optimal overall strategy for reproducible papers. For the
moment, I've used a single shell script in
The software evolution problem
I suspect, unfortunately, that there's a fundamental
dilemma in making fully reproducible papers that remain
reproducible in the long term, because of software evolution.
Akhlagi's approach is to download and compile all the libraries
that are needed by the author(s)' software, in specific versions
of the software that were used at the time of preparing the research
paper. This would appear to solve the software evolution problem.
My approach, at least so far in
is to use the native operating
system (Debian GNU/Linux, in my case) recommended versions of all
libraries and other software, to the extent that these are available;
and to download and compile specific versions of software that
are "research-level" software, either not yet available in a standard
GNU/Linux family operating system, or evolving too fast to be available
in those systems.
Download everything: pro
- exact reproducibility: Downloading as much
software, including various libraries, as possible, with specific
commit hashes (frozen versions), should, in principle, enable
users at any time later to reproduce exactly what the authors
claimed they did to obtain their results. Tracing the origin of
differing results should be easier than if these libraries are not
downloaded in exactly the same versions.
Download everything: con
- heaviness: Downloading the source code of
libraries (such as the GNU Scientific Library) that are integrated
into a well-tested, stable operating system such as Debian
GNU/Linux (stable), and recompiling them from scratch, can consume
a lot of download bandwidth and a lot of cpu time. If the user
wishes to repeat the cycle from scratch many times, this becomes
climate crisis, risks becoming unethical if the benefits are too
modest compared to the costs in "carbon load".
- security risks: Old versions of standard
libraries contain science errors, software errors, and security
bugs. Reproducing the science errors and software errors of the
authors is acceptable, since the aim is to check the authors'
claims. But running software with unfixed security flaws is
unwise. Running user-space software that is out-of-date in terms
of security is less risky than doing so for root-level software,
but is still a risk. Once a cracker has obtained user-level
remoted access to a computer, escalating to root access is a much
smaller challenge than getting initial access to the system.
- dependency hell: "Science research" software
depends on lower level libraries (integrals, derivatives,
interpolation, monte carlo methods) that themselves rely on lower
level numerical algorithm libraries, that need to be integrated
with parallelisation libraries, and that themselves depend on the
kernel and other system-level libraries. The FLOSS community is
constantly improving and fixing these different software packages,
as well as package management systems. Some packages or some of
their functions may become unmaintained and obsolete. There is a
complex logical graph of dependencies between software packages,
the complexity is unlikely to weaken, and the ecosystem of
software is not going to stop evolving. A package that can
successfully download and compile a few hundred Mb of associated
software source codes and correctly run in 2019 might be extremely
difficult to run in a standar software environment in 2029 or 2039.
The user could be forced to deal with
in order to check the package from 2019.
- inexact reproducibility?: Ten or 20 years after
a research paper is published, how easy will it really be to
provide an operating system and software environment that
is really identical to that used by the author? There is
such a diversity of the GNU/Linux-like operating systems, that few
scientists will be really interested in trying to emulate "antique"
operating systems/software environments.
Prefer native libraries: pro
- efficiency: In by-default binary distributions,
such as Debian, the bandwidth and cpu loads for binary versions of
libraries are much lighter than for the "download everything" approach.
- security: Well-established software
distribution communities such as Debian, with numerous quality
assurance pipelines and methods and bug management, will tend to
provide high (though never perfect) standards of software
security, as well as correct science and software errors.
There is a complex logical graph of dependencies between
software packages, and the complexity is unlikely to weaken.
Using native libraries avoids dependency hell.
Prefer native libraries: con
- faith in modularity: The "prefer native" approach
effectively assumes that any bugs or science errors in the research
level software lie in the "science level" software and are not the
fault of libraries that are stable enough to be "native" in the
operating system. But this might not always be the case: the fault
might be in the native library, and either have been fixed, or have
been introduced, in versions more recent than those used by the
research article authors.
Choosing an approach
While the "download everything" approach is, in principle, preferable
in terms of hypothetical reproducibility, it risks being heavy, could
have security risks, could be difficult due to dependency hell,
and might in the long term not lead to exact reproducibility anyway,
for practical reasons (leaving aside theoretical Turing machines).
The "prefer native libraries" approach provides, in principle, less
reproducibility, but it should be more efficient, secure and convenient,
and, in practice, may be sufficient to trace bugs and science
errors in scientific software.
permanent link |
trackback: ping me (experimental)
Please publish comments on a community-based Fediverse server of your choice and ping me in the comment with @email@example.com.