|
|
Tue, 07 Jul 2020
The Slack and Zoom gilded cage for astronomers
Why should astronomers not use Zoom or Slack for
voice/audio/text/file communication over the Internet?
Practical reasons include:
- we should
"keep
control of the software — so that the software doesn't control
us";
- we should use software that allows and encourages
interoperability like email,
so that nobody is forced to use any particular server or software;
- we should be able to easily export our communications and stored information and copy
these locally or shift them to another server;
- we should not be forced to install unverifiable software that may contain
Trojans, backdoors
or other malware.
Zoom and Slack both violate 1 — their software is non-free.
2. In 2018
Slack
stopped allowing connections over two of the most widely used
messaging protocols — irc and xmpp:
Slack is opposed to the freedom to interconnect between instant messaging networks.
Slack's strategy of gradually burning
these bridges/gateways" as it increases its market dominance is part of
vendor lock-in.
3. You'll have to check if Slack/Zoom make data export this easy, but I suspect that they do not.
This is another component of vendor lock-in.
(Interoperability and data export are in principle
closely related.)
4. Use of Zoom forces us to install dangerous software (binary blobs) on our
computers. We and the wider software community cannot verify that the binary blobs needed
for the Zoom client are free of backdoors and trojans. Zoom client software is
unverifiable software.
Do the ends justify the means?
Independently of the practical reasons to not use Zoom or Slack (or Skype, MS Teams/GAFAM, Webex),
there are ethical reasons:
Ethical reasons include:
- "You
are the product, not the customer." When you use Zoom or
Slack,
you are the product that they sell to corporate clients.
They will do whatever they can to keep a big mass of users addicted to their
services, and sacrifice your privacy, freedom of interoperability, freedom to
backup your own data, or cybersecurity, if it is in their corporate interests
to sell these to the highest bidder.
-
By pressuring your local community to use Slack or Zoom, you are weakening
the support for ethically constructed communities — those built on the
basis of free-licensed software, transparency, cooperation, intellectual
freedom. Developers of Jitsi/BBB/Jami/Matrix need bug reports, wishlist items,
open, constructive discussion and encouragement to continue. You are not
forbidden from supporting these community software developers
with money:
free software
does not mean zero-payment software.
The most common counterargument to the practical and ethical arguments above is
the Tyranny of Convenience [Keye 2009]
(and [Wu 2018]):
"It works! It works! I just
want to communicate efficiently! I'm not an expert in software! Most people in our
community use it, so we should too. And Zoom/Slack has feature X, which I
couldn't find on Jitsi/BBB/Jami/Matrix in a five-second search." This brings us back to
consequentialism,
the philosophical stance according to which the ends justify the
means. The question here is how bad the means are
compared to the ends. Software is at the core of the
biggest geopolitical and economic power struggles of the XXIst century. Is it
worth it to support authoritarian software and close to totalitarian software
corporations given that "it's convenient? How many people
in the XXth century felt that convenience justified small actions,
in themselves "non-political" but implicitly supporting the
totalitarian governments of that century, only to regret it later? And how does Slack actually
behave towards its employees?
"Slack employees
... cannot speak out about [the propietary Slack software], for
fear of retribution (so they're inherently gagged by fear over
mortgage etc. or self-restraint that defies logic/ethics)",
according to Roy Schestowitz.
Alternatives exist! A complementary answer to the
practical arguments above is that if we want text, voice and video
communication — after all, we're humans and it's especially
important during the pandemic to keep up the
video-stream-to-video-stream contact — it feels good —
then we should remember that we do already have practical free software packages to
run ourselves and servers that already run that software. Checking at
https://switching.software
we find:
Slack and Zoom control us if we use their services.
But we control Jitsi/BBB/Jami/Matrix.
Continuing to more robust communication,
the big paradox is how it's possible for people with
PhDs in astrophysics to claim that they cannot handle
irc.
Irc is efficient, robust, light-weight and has matured through
several decades of debugging and development. You can
choose any
client of your liking on your own computer — in a
standalone gui, in a browser or in a terminal.
It's not rocket science. And since we cannot do "rocket science" without
typing equations, text, reasoning, specific lines of code — what's wrong with irc? For
observational files, databases, software, diagrams, git repositories, all of this in the end has to be handled as text.
In any case, those who want audio/video have it with Jitsi/BBB/Jami/Matrix.
So not only are Zoom and Slack impractical and unethical, but
there's no need to use them. They don't provide the freedom to
communicate; they instead welcome us instead to prison — which, for
the moment, seems to
be gilded,
but is still a prison with all the associated costs.
fr |
permanent link |
RSS |
trackback: ping me (experimental)
Comments:
Please publish comments on a community-based Fediverse server of your choice and ping me in the comment with @boud@framapiaf.org.
Mon, 01 Jul 2019
Science reproducibility: the software evolution problem
What is the point of publishing a scientific paper if an expert reader
has to do so much extra work to independently reproduce the results
that s/he is effectively discouraged from doing so?
Reproducibility: brief description
In the present practice of cosmology research, such a paper tends to
be accepted as "scientific" if the method is described in sufficient
detail and clearly enough, and if the observational data are
publicly available in the case of an observational paper. However,
the modern concepts
of free-licensed
software and efficient management of software evolution
via git repositories
over the Internet, as well as Internet communication in general,
should make it, in principle, possible to allow an expert reader to
reproduce the figures and tables of a research paper with just
a small handful of commands in a terminal, to download, compile and
run scripts and programs provided by the authors of the research article.
This will in practice make it easier for more scientists to verify
the method and results, and improve on them, rather than forcing them
to rewrite everything from scratch.
This idea has been floating around for several years. A very nice
summary and discussion by Mohammad Akhlagi includes Akhlagi's own aim of making the
complete research paper reproducible with just
a few lines of shell commands, and links to several astronomical
reproducible papers from 2012 to 2018, most using complementary
methods.
I tend to agree that using Makefiles is most likely to be
the optimal overall strategy for reproducible papers. For the
moment, I've used a single shell script in
1902.09064.
The software evolution problem
I suspect, unfortunately, that there's a fundamental
dilemma in making fully reproducible papers that remain
reproducible in the long term, because of software evolution.
Akhlagi's approach is to download and compile all the libraries
that are needed by the author(s)' software, in specific versions
of the software that were used at the time of preparing the research
paper. This would appear to solve the software evolution problem.
My approach, at least so far in
1902.09064,
is to use the native operating
system (Debian GNU/Linux, in my case) recommended versions of all
libraries and other software, to the extent that these are available;
and to download and compile specific versions of software that
are "research-level" software, either not yet available in a standard
GNU/Linux family operating system, or evolving too fast to be available
in those systems.
Download everything: pro
- exact reproducibility: Downloading as much
software, including various libraries, as possible, with specific
commit hashes (frozen versions), should, in principle, enable
users at any time later to reproduce exactly what the authors
claimed they did to obtain their results. Tracing the origin of
differing results should be easier than if these libraries are not
downloaded in exactly the same versions.
Download everything: con
- heaviness: Downloading the source code of
libraries (such as the GNU Scientific Library) that are integrated
into a well-tested, stable operating system such as Debian
GNU/Linux (stable), and recompiling them from scratch, can consume
a lot of download bandwidth and a lot of cpu time. If the user
wishes to repeat the cycle from scratch many times, this becomes
prohibitive in terms of user patience, and in the context of the
climate crisis, risks becoming unethical if the benefits are too
modest compared to the costs in "carbon load".
- security risks: Old versions of standard
libraries contain science errors, software errors, and security
bugs. Reproducing the science errors and software errors of the
authors is acceptable, since the aim is to check the authors'
claims. But running software with unfixed security flaws is
unwise. Running user-space software that is out-of-date in terms
of security is less risky than doing so for root-level software,
but is still a risk. Once a cracker has obtained user-level
remoted access to a computer, escalating to root access is a much
smaller challenge than getting initial access to the system.
- dependency hell: "Science research" software
depends on lower level libraries (integrals, derivatives,
interpolation, monte carlo methods) that themselves rely on lower
level numerical algorithm libraries, that need to be integrated
with parallelisation libraries, and that themselves depend on the
kernel and other system-level libraries. The FLOSS community is
constantly improving and fixing these different software packages,
as well as package management systems. Some packages or some of
their functions may become unmaintained and obsolete. There is a
complex logical graph of dependencies between software packages,
the complexity is unlikely to weaken, and the ecosystem of
software is not going to stop evolving. A package that can
successfully download and compile a few hundred Mb of associated
software source codes and correctly run in 2019 might be extremely
difficult to run in a standar software environment in 2029 or 2039.
The user could be forced to deal with
dependency hell
in order to check the package from 2019.
- inexact reproducibility?: Ten or 20 years after
a research paper is published, how easy will it really be to
provide an operating system and software environment that
is really identical to that used by the author? There is
such a diversity of the GNU/Linux-like operating systems, that few
scientists will be really interested in trying to emulate "antique"
operating systems/software environments.
Prefer native libraries: pro
- efficiency: In by-default binary distributions,
such as Debian, the bandwidth and cpu loads for binary versions of
libraries are much lighter than for the "download everything" approach.
- security: Well-established software
distribution communities such as Debian, with numerous quality
assurance pipelines and methods and bug management, will tend to
provide high (though never perfect) standards of software
security, as well as correct science and software errors.
- convenience:
There is a complex logical graph of dependencies between
software packages, and the complexity is unlikely to weaken.
Using native libraries avoids dependency hell.
Prefer native libraries: con
- faith in modularity: The "prefer native" approach
effectively assumes that any bugs or science errors in the research
level software lie in the "science level" software and are not the
fault of libraries that are stable enough to be "native" in the
operating system. But this might not always be the case: the fault
might be in the native library, and either have been fixed, or have
been introduced, in versions more recent than those used by the
research article authors.
Choosing an approach
While the "download everything" approach is, in principle, preferable
in terms of hypothetical reproducibility, it risks being heavy, could
have security risks, could be difficult due to dependency hell,
and might in the long term not lead to exact reproducibility anyway,
for practical reasons (leaving aside theoretical Turing machines).
The "prefer native libraries" approach provides, in principle, less
reproducibility, but it should be more efficient, secure and convenient,
and, in practice, may be sufficient to trace bugs and science
errors in scientific software.
fr |
permanent link |
RSS |
trackback: ping me (experimental)
Comments:
Please publish comments on a community-based Fediverse server of your choice and ping me in the comment with @boud@framapiaf.org.
Sat, 06 Apr 2019
Why non-use of ArXiv refs in a bibliography is unethical
It has become quasi-obligatory since the late 1990s for cosmology
research articles to be posted at
the ArXiv preprint server, making
them publicly available under
green open
access. Much of other astronomy, physics and mathematics articles
needed for cosmology research is also available at ArXiv. In
practice, this means that almost all post-mid-late-1990s literature
cited in cosmology research articles is available on ArXiv.
Many of these articles are posted before external
peer-review by research journals, so they are literally "preprints",
while others are posted after acceptance by a journal, but usually
before they appear in paper versions of the journals, for those
journals that are still printed on paper, or as online "officially
published" articles. However, most of these "preprints" are cited
before they are formally published — because they're
hot-off-the-press, state-of-the-art results, or to put in plain
English rather than advertising jargon, they're useful new results
that need to be taken into account. Several journals, including MNRAS
and A&A, insist on hiding the fact that references are easily
obtainable without paywall blocks by requiring all references that
have peer-reviewed bibliometry data to have
their ArXiv
identifiers removed from the list of references (bibliography) of
any research paper!
The reason cited by colleagues (there doesn't seem to be a formal
public justification by MNRAS/A&A) for excluding ArXiv identifiers
from the bibliography for articles that are already formally published
is to restrict citations as much as possible to the peer-reviewed
literature. But this is nonsense: including both the peer-reviewed
identifying information (year, journal name, volume, first page)
and the ArXiv identifier informs the reader that the
article is peer-reviewed, while also guaranteeing
that the article is available to the reader (at least) under green
open access. So that reason is unconvincing.
Another reason cited by colleagues is that the journal versions are
more valid than the preprints, since the journal versions have usually
been updated following peer-review and following language editor and
proof-reader requests for corrections. This reason has some validity,
but in practice is weak. Article authors quite frequently update their
preprint on ArXiv to match the final accepted version of their article
(in content, not in the particular details of layout, to reduce the
chance of copyright complaints by the journals), because they know
that many people will access the green open access version, and they
want to reduce the risk that readers will refer to an out-of-date
preprint version. Other authors only post their article on ArXiv once
it is already accepted, in which case no significant revision is
needed to match the content of the accepted version.
If the reasons for hiding ArXiv references are weak, what are the
reasons for including ArXiv references?
- For articles that are not provided in open access mode by the publishers
either immediately or after an embargo period
(such as many cosmology journals including JCAP, PRD, and CQG,
which seem to block all of their
articles behind paywalls unless open access charges are paid by the
authors at an appropriate step of submitting the article for
publication),
removing/omitting ArXiv references from a reference list
blocks access to the research articles for:
- scientists (physicists, mathematicians) in institutes who do not
pay for subscriptions to astronomy/cosmology journals;
- astronomers in institutes who do not
pay for subscriptions to maths/physics journals containing articles
with justifications of mathematical techniques or physics that is
not published in astronomy journals;
- scientists (astronomers, physicists, mathematicians) in
institutes/universities who do not pay for global subscriptions to
the publishers of the journals referred to;
- scientists in poor countries who do not pay for any journal
subscriptions at all;
- the general public — including former
astronomy/cosmology students who retain an interest in
cosmology research and have the competence to understand
research articles — who do not have access to any research
institute or university journal subscriptions.
Arguments 1, 2, and 3 are practical problems; these researchers will generally
know that they can search ArXiv and the
ADS
and after 30–120 seconds will find out if the article is available
on ArXiv, or possibly by open access on the journal website.
Argument 3 here can be considered as a form of racism. There are several
Nobel prizes explicitly related to
Bose's
contributions to physics,
Chandrasekhar
actually got a Nobel prize rather than merely having
his name cited in the topics of Nobel prize awards, but the
reality of today's economic/political/sociological setup is
that the budgets of many Indian astronomy research institutes are
far lower than that of rich-country institutes, so excellent scientists
of high international reputations, and their undergraduate and
postgraduate students, have to do research without having
access to any paid journal subscriptions.
Argument 5 could be considered as arrogance, elitism, and/or bad
public relations in the Internet epoch.
-
A&A now has a short embargo (12 months?) for paywall blocks
on articles, after which all articles become gold open access (with no
extra charges to authors); MNRAS has a longer embargo, and other journals
are under pressure to shift to open access. So what are arguments
for including ArXiv identifiers for peer-reviewed articles
that are available under open access by the
publishers?
- It would require a lot of extra administrative effort by authors to
update their .bib files depending on the dates on which articles become
open access after an embargo;
- It would require a lot of extra administrative effort by authors to
modify their .bib files to separate out journals whose articles are never
open access from those with an embargo period;
- Authors at institutions with some or many journal subscriptions
generally don't notice whether or not a cited article
is behind a paywall, because the publishers' servers usually have
IP filters that automatically recognise authors' computers as having
authorisation to access the articles.
- Although big journal publishers can probably be relied
on, to some degree, to maintain their article archives in
the long-term, we know that the group of people running ArXiv have
solid experience in long-term archiving and backing up
(data storage redundancy) practices, and they have no
conflict between commercial motivations and scientific
aims.
- A typical article has anywhere from 30–100 or so references.
Each of those also has from 30–100 or so "second-level" references.
And so on. Even if the n-th level references are to a large degree
redundant, a complete survey of the third or fourth level of references
could easily cover 1000–10,000 articles. Nobody is going to
read that many background articles, and not even their abstracts.
Obviously, in practice, a reader can only trace back a modest number
of references, and a modest number of references in those references.
So for those articles that can be, after a little effort, found
by the reader despite the ArXiv identifier being
omitted, or as publisher-provided online articles, the hiding of the
ArXiv identifier (and lack of a clickable ArXiv link) slows down the
time for the reader to find the abstract and decide whether or not to
read further. Even though the slowdown might only be an extra minute,
multiplying that extra minute by the number of references to be potentially
checked leads to a big number of minutes. Adding unnecessary "administrative"
work for the reader is obstructive.
So that's why you should include ArXiv references in the
bibliographies of your research articles. You can set up a LaTeX
command so that if the journal asks you to remove them in the
official version, you do that at the final stage for your "official"
version, because you don't want to waste time trying to convince the
journal about the ethical arguments above. But in your ArXiv versions
and other versions that you might distribute to colleagues, you should
favour the more ethical versions, which include the ArXiv references.
fr |
permanent link |
RSS |
trackback: ping me (experimental)
Comments:
Please publish comments on a community-based Fediverse server of your choice and ping me in the comment with @boud@framapiaf.org.
|
|
|