Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
To software engineers criticizing Neil Ferguson’s epidemics simulation code (khinsen.net)
90 points by todsacerdoti on May 18, 2020 | hide | past | favorite | 186 comments


I poked at the github repo for a bit. The ugliness of the code doesn't bother me, but the quantity of parameters does.

Here's one params file that specifies some of the inputs to a run of the model:

https://github.com/mrc-ide/covid-sim/blob/master/data/param_...

Here's another one:

https://github.com/mrc-ide/covid-sim/blob/master/data/admin_...

There are hundreds of constants in there. A lot of them appear to be wild-ass guesses. Presumably, all of them affect the output of the model in some way.

When a model has enough parameters for which you can make unsubstantiated guesses, you have a ton of wiggle room to generate whatever particular output you want. I'd like to see policy and public discussion focus more on the key parameters (R-naught, hospitalization rate, fatality rate) and less on overly-sophisticated models.


You're correct to focus on the effect of parameter choices over code quality. It's been a little funny to watch a bunch of software engineers freak out about unit tests while ignoring everything else that has a much larger impact on the output of the model. I would bet large sums of money that this code is producing the correct output according to the model/parameter specifications.

All I can say is welcome to epidemiology. The spread of a disease is highly dependent on a host of factors that we have very little insight into. Even simple things like hospitalization rate or fatality rate can be difficult if not impossible to estimate accurately. Epidemiologists are open about this, but few people ever want to listen. Humans just aren't good at truly conceptualizing uncertainty.

The theory behind disease spread models is relatively sound, but they're highly dependent on accurate estimates of input parameters, and governments have not prioritized devoting resources toward improving those estimates. I sat in on discussions between epidemiologists and government officials about COVID models. The response to nearly every question was "we don't know, but here's our best guess". I listened to them beg officials for random testing of the population to improve their parameter estimates. That testing never happened.


I would bet large sums of money that this code is producing the correct output according to the model/parameter specifications.

I'll take that money off you then.

The code has various memory safety bugs in it and originally had a typo in a random number generator constant. Amongst other problems.

There's really no reason to believe it produces correct outputs, in fact, we know it didn't and probably still doesn't given how it was written.


The problem is, unsophisticated models do not predict anything. You apply them in one country and they do ok, and apply them in another and they get it totally and completely wrong.

Unless all important factors are accounted for, they are going to result in incorrect information for someone. Public policy will then be based on incorrect predictions. People will grow tired of the predictions being wrong and they'll give up on data science entirely.

It's already quite bad that people think they can choose their reality by finding numbers that agree with them and ignoring the ones that don't.

I do understand the point you are making, which is like the epicycles argument. But in global warming and epidemics alike, more parameters are actually needed to model reality.

I do agree that those parameters should be based on actual data, not guesses though. But what value of R would you pick? Is that actually well-constrained?


I would pick a value of R that shows itself to have good predictive accuracy.

The way to test predictive models is always to look for their predictive accuracy on holdout data. Machine learning has this ingrained. Classic statistics does this too -- AIC is used to compare models, and it's (asymptotically) leave-one-out cross validation [1].

There's nothing intrinsically wrong with models that have millions of parameters; they might overfit in which case they will have poor predictive accuracy on holdout data, or they might predict well.

I agree with the original article that software engineer scrutiny isn't appropriate for this sort of code -- but I would argue instead that it needs a general-purpose statistician or data scientist or ML expert to evaluate its predictive accuracy. You can't possibly figure this out from a simulator codebase.

At the time the model was published, and acted on by the UK government, there was very little data on which to test predictive accuracy. That's fine -- all it means is that the predictions should have been presented with gigantic confidence intervals.

[1] http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf


The model isn't predictive though - it's a simulator. If we'd waited until we had enough data to make predictions with it (which I doubt you could given the sheer number of parameters) it'd be too late to use any of the interventions.

How would you ethically collect training data for the interventions?


The outputs of the model _were_ being treated as predictions.

The Ferguson paper from 16 March used the language of prediction: "In the (unlikely) absence of any control measures [...] given an estimated R0 of 2.4, we predict 81% of the GB and US populations would be infected over the course of the epidemic." [1]. The news coverage also used that language: "Imperial researchers model likely impact of public health measures" [2]. And look at the rest of the comments in this discussion, and count how many types "predict" appears!

> If we'd waited until we had enough data to make predictions with it

This is like the drunk looking for their keys under a streetlight. "Did you lose the keys here?" "No, but the light is much better here." -- "How confident are you in your model's predictions?" "I have no idea, but it's the model I have."

Also -- the Ferguson model made predictions, based on the parameters they picked. You don't need to wait for data to make predictions; you only need data to validate your predictions.

> How would you ethically collect training data for the interventions?

You don't. You (as a scientist who influences public policy) should publish validated confidence intervals for your predictions. You (as a government) should understand that there is a huge margin of uncertainty in the predictions, and accept that sometimes you just have to make decisions in the absence of knowledge. You (both the scientist and the government) do not go around spouting "Our decisions are led by science".

[1] https://spiral.imperial.ac.uk:8443/bitstream/10044/1/77482/1...

[2] https://www.imperial.ac.uk/news/196234/covid19-imperial-rese...


How do you validate the predictions for the number of infected cases in May for scenarios that don't happen?


> The problem is, unsophisticated models do not predict anything. You apply them in one country and they do ok, and apply them in another and they get it totally and completely wrong.

That's the nature of all models, "sophisticated" or not. Relatively simple models may or may not be useful for a particular case, just as relative complex models may be.


"But what value of R would you pick?"

I don't know -- and until we can agree on the answer to your simple question with a high degree of confidence, I think complex models based on specific assumed values of R obscure more than they reveal.

A little bit of modeling is useful because humans are intuitively bad at exponential math and we need scary graphs to jolt us awake sometimes. But when we don't even know the basic parameters (transmission/hospitalization/fatality) with a high degree of precision, complex models with myriad parameters create a false sense of confidence.


We have a model, we can run some sensitivity analysis, then we can go out and collect data to better estimate the parameters to which we are sensitive to. Important but not glamorous work and hence underfunded.


I was asked to look at the spatiotemporal parameters and modeling, separate from any code issues. That part of the model is astonishingly naive, apparently oblivious to existing research and science on the matter that strongly recommends a different and much more nuanced approach. Industry has invested inordinate amounts of money in understanding how to build effective real-world predictive models of this type and none of that knowledge is reflected here. That seems like a rather glaring oversight and alone voids any utility as a predictive model.


I partially agree with the comment above, but I also think it misunderstands how numerical models are often used. At least where I've built them (not epidemiology), the goal wasn't necessarily to gather the most accurate set of inputs and produce the most accurate prediction of the output. The goal was often to help a highly skilled operator explore the parameter space and guide their intuition on the problem, to help that person and simulation together reach some decision.

So code quality mattered less then usual. If there's a significant bug, then the operator will probably notice, and if there's an insignificant bug then no one cares. The large number of input parameters also doesn't matter. The operators are fully aware that they could artificially manipulate the output to wherever they wanted, but to do so would be cheating only themselves.

It feels to me like Ferguson's model was built with similar intent, and probably served that purpose well. The problem came only when the media portrayed the model as a source of authority apart from the people operating it, perhaps to create a feeling of objectivity behind the decisions driven from that. That created an expectation of rigor that either didn't exist (in the software engineering), or fundamentally can't exist given our current knowledge of the science (in the input assumptions).


This reminds me of the Drake equation. A sound formula for the probability extra terrestrial life..but half the parameters are wild guesses that can differentiate in orders of magnitude.


The flip side to having lots of parameters is that you have lots of knobs to tune beyond a basic lockdown.


Not a comment on the specific repo in question, but I just want to note that I have seen utter monstrosities of academic code written in Python, MATLAB, and R - languages that are ostensibly "easier" than C++. I so not think that poor code quality is due to the many footguns C++ admittedly gives you.

I am sure that the main research is not in the implemented code. But with unclear code, it is exceedingly hard to know that there are no mistakes: that the researched models have been properly encoded.

That, I believe, is what software engineers are afraid of.


Exactly. The C++ code in the GitHub repo is absolutely frightening, but I'm 100% sure if would be pretty much the same in e.g. Python, probaly even worse. From my own experience code quality is inversely related to barrier of entry, which means I see a lot more terrible MATLAB and Python code than terrible C++ code.

The only conclusion you can draw from a repo like this, is the conclusion I've drawn countless times when the next piece of modeling code developed by some math or physics expert lands on my desk: these kinds of things can not be developed by domain experts alone, you always need to include skilled software engineers to translate the prototypes they develop (preferably in something like MATLAB) into production code.


How do you propose identifying these "skilled software engineers"? University doesn't teach "professional software engineering" so all these hotshots coming out of school into industry can't be what you're looking for.

There is no standards body defining the skills one needs to be considered professional. There is no responsibility on practitioners unlike other engineering professions. If a mechanical engineer screws up and causes death, it's going to be bad for them. Software routinely screws up with no consequence.

Hell, take two "software engineers" from FAANG and give them a piece of code and they won't agree on whether it's good or bad.

None of this is to excuse poorly written scientific code, but if "professional software engineers" want to throw rocks, maybe they should fix their own glass house first.


I work in an research setting with access to lots of skilled software engineers and I've come to a very similar conclusion. The problem I often see is that there is very little incentive for the researcher to bring a software engineer onto their project unless it is absolutely necessary. It is just now becoming commonplace to release code when you publish and plenty of researchers still don't.

If you think no one is going to see your code then it makes it much harder to care about its quality. One way we've been fighting this internally is by trying to get researchers less silo'd and more open to code reviews. Once a code review is part of the process then bringing on an "expert coder" to address some of the issues that come up in the reviews has more tangible results! We've seen some success in improving our code quality with this strategy.


> it is exceedingly hard to know that there are no mistakes

But that's true of all software, no matter how well engineered. Usually when we discuss issues of code quality, we're looking at long-term maintainability and overall efficiency - that is, targets that are achievable. I'd love to see some effort toward quality metrics around provability of correctness, but I'm not even sure that's possible.


Without formal provability, yes, it's impossible to know there are no mistakes. Even then as most formal methods appear to operate on rules and requirements rather than on the actual code, it's still not possible.

However - with well formatted code that's had a review process of some sort, it's going to be easier to cut down on the glaring errors even if there are less obvious problems lurking.


I'm not sure I would describe R as "easier" than C++ tbh.


It's much more accessible, as you need to know less to use it.

Consider - using Eigen in a C++ project, or calling install.packages("eigen") in R.


It's easier to dip your toes into, but it is a terrible language to learn software development with. It's such a hodge-podge of ill-thought-out and ill-fitting components with random names and no overall sense of structure that you're never going to learn core CS concepts from it, let alone good programming practice.


That's not really its purpose though, right?

R is "an environment for statistical computing and graphics". Note that it doesn't talk about software engineering at all.

I agree with you that R has lots of rough edges, but please remember that it's a 90's era clone of a 70's era language (S) and a lot of those rough edges and corners are legacies from that time.

I completely agree that the naming conventions (i.e. the absence of same) are super annoying, but again it's a tradeoff for the decades of statistical computing knowledge embedded in the system of R.

I find your disdain for R a little annoying, and while I'm probably not a computer scientist, it was my first language and I learned about closures, higher order functions, OOP, testing and interacting with API's from it, and it definitely gave me insight into how computers worked.

I'm sorry that you have to deal with horrible legacy R code, but that's no reason to throw the baby out with the bathwater.


> I learned about closures, higher order functions, OOP, testing and interacting with API's from it, and it definitely gave me insight into how computers worked.

Well done, I'm honestly impressed :D

> remember that it's a 90's era clone of a 70's era language

That's not an excuse. Lisp is two decades older.

> I find your disdain for R a little annoying ... I'm sorry that you have to deal with horrible legacy R code

This has nothing to do with legacy code - I rarely have to touch that. (Scientific code is rarely reused...). R does offer a lot of great features, I'll grant you that. It has a huge library with great functions for doing statistical analysis and really neat graphics - exactly what it was designed for. As a biologist, I cannot imagine not using it.

But, and this is a big but, as a programming language it is absolutely horrible. I've worked with half a dozen languages, and none of them are anywhere near as big a pain in the backside as R. Most of that has to do with the terrible incongruity of a plethora of mutually-incompatible data types. You know, you have one function that will only work on a data frame, but somehow your data ended up as a list; or you want strings or numbers but have factors; that kind of thing. In no other language do I have to do as much googling while coding, because even basic operations are so willy-nilly idiosyncratic in their details that I keep forgetting how to do them properly. The "tidyverse" and related packages do a lot to make the experience less painful, but pure R remains a linguistic nightmare.


I really appreciate this sentiment as a data person who learned R first, now works in python, and helps non-engineer scientists write R programs for research. I've had great luck teaching scientists to think in a function-driven way, where functions are pure-as-possible and inputs are never mutated (which doesn't need to be discussed as R makes the opposite quite unnatural). I can't imagine teaching these same folks about python, custom classes, and why you need pd.DataFrame.copy() all over the place.


` and why you need pd.DataFrame.copy() all over the place.` Thanks for saying this!

Also, in the past decade, the R data analysis/munging ecosystem has matured far more than the equivalent in Python. These days, writing (tabular) data transformation code in R is often far cleaner, clearer and less error prone than any other platform imo.


Many senior members of the academic community rely on their reputation as researchers to brush aside basic issues with the software that they develop for scientific purposes. These include the lack of testability, debuggability, reproducibility, separation of concerns, documentation, or usability. The lack of focus on research software quality among senior PIs, funding committees, and article reviewers is a huge problem in academia. The problems with the Ferguson model and codebase are just the latest prominent example of this.

The tools that academics have access to are excellent. If you compare the free support academic software developers receive from the rest of their community to other engineering disciplines, it's beyond great. I think the OP is wrong to suggest otherwise.

The problems with the Ferguson model are an opportunity to educate more members of the community about the fact that good software development practices are not optional for good science, and that senior members of the community jeopardize their own reputation by not paying attention to them.


> These include the lack of testability, debuggability, reproducibility, separation of concerns, documentation, or usability.

Or maybe these things aren't actually as important as we think they are in professional software development?

If they're able to produce useful scientific results (in general, not specifically in this case) without those things then maybe they don't matter as much as we think they do?


These qualities are important for exactly the same reasons they are in production: Without those qualities, your code is brittle, your deploys are brittle, changes are brittle.

It's just like saying "It runs on my machine". The scientific term for this is "Replication crisis [0]"

0 - https://en.wikipedia.org/wiki/Replication_crisis


The Replication Crisis is much larger in scope than just software reproducibility. But yes, it does include that too.


Certainly. I think the "it works on my machine" attitude is reminiscent of the problems associated with the replication crisis.


> Without those qualities, your code is brittle, your deploys are brittle, changes are brittle.

Is this brittleness stopping the scientists achieving what they need to achieve?

Are you sure that writing tests makes science better? Or are you just assuming that?

They aren't idiots and they aren't ignorant of how professional software developers work.


Clearly not, but that's because what they need to achieve is publication of something interesting, not something correct.

Academics have a lot of excuses for writing terrible code. It's really shocking. So far I've seen:

- It doesn't matter if the code is buggy because the results don't need to be accurate

- We can't afford to write good code.

- We aren't properly trained to write good code.

- We just average the results and that fixes the bugs (!)

- We aren't computer scientists so what do you expect.

- Science is special which means we don't need to write tests to get correct results. We just eyeball them and know if they're right (so why bother writing a model at all then?)

- You can't criticise or judge us because mere software developers can't understand science.

- Nobody told us it's easy to screw up memory management in C

And a whole lot of other baffling and insulting nonsense. How many of these excuses would be accepted if a private company produced a dangerous product due to bad code, and produced this litany of BS in a courtroom? None.

They aren't idiots and they aren't ignorant of how professional software developers work.

It's apparent from some of the responses to this fiasco that they are totally ignorant of how professional software developers work. And they're proud of it, which in my view makes them idiots too. You can't both tell governments and whole societies to "follow the science" and then blow off any suggestion of working to professional standards.


> Are you sure that writing tests makes science better? Or are you just assuming that?

It is perfectly acceptable to write code without tests. Proof-of-Concept or Minimum Viable Product are a great place to write code without tests.

It is less acceptable if other people will run or use that code. It is even less acceptable if anyone (including oneself) ever updates or extends the code.

---

You could take this analogy to scientific instruments. Imagine you make a novel particle detector. You get a scientific result with your detector.

A colleague uses your detector, but they don't clean it properly before use, and they use a power supply with lower voltage. They don't detect any particles! Was your science bad? Would the "science be better" if there were clearer instructions and pre-requisites?

Now imagine another scientist makes a copy of your detector from the description in your paper. They get some stuff wrong because your description was ambiguous. Was the science bad?

---

By the way, all of these things are real problems with scientific investigations, and not just in the software realm.


If the results aren't reproducible, they can't be assumed to be true. Then they're only useful if you only care about publication and not about whether the results are actually true.

And yes, this is a serious problem in science.


> If the results aren't reproducible, they can't be assumed to be true.

I don't understand why having brittle code would mean that the results are not reproducible? You don't need to modify the program to do a reproducibility study.


Reproducibility is one of the issues mentioned earlier in the thread. And being able to audit the code and understand what it actually does seems rather important too.


Move fast and break things has its downsides: work that influenced public policy/political debate was retracted after an error in the model was discovered [1], errors have been found in widely cited databases [2]. These are things that should be avoided I'm sure you'll agree.

Science has this additional problem that its memory is short - mistakes seem to be discovered when work is in the long tail of the citation curve, once it's out of the news. Even if you retract a paper, there is no easy way to trace the contagion to the work that uses it. That's before you consider mistakes that might be deliberate [3].

I have no doubts that more software/data rigour would make science more accurate, but the cost would be substantial, and it would no doubt slow down discovery until the benefit of open source kicked in.

[1] : https://theconversation.com/the-reinhart-rogoff-error-or-how... [2] : https://www.the-scientist.com/news-opinion/mistaken-identiti... [3] : https://retractionwatch.com/2016/09/26/yes-power-pose-study-...


> Or maybe these things aren't actually as important as we think they are in professional software development?

There's something to this. Most academic software is a simple one-off development, with little consideration for long-term maintenance. OTOH it's not likely that undocumented, untested and unfixable software can produce actual "useful scientific results".


The problem is, we need confidence that the software is correct in order to trust the scientific results.


Do you need tests to do that? Are tests the only way?

Are you doing formal verification of your software? Why not? If you're not bothering to do that why are you criticising researchers for not bothering to use tests?


As a former academic, the point is to get the research published. Whether the code continues to work after that, is irrelevant.


This is, indeed, the whole truth and nothing but the truth. My own anecdote: through happenstance I'm co-author on a paper, a large number of my friends are PIs/PhDs etc and one of them needed help after the software written for them by the CS student they'd been assigned didn't work (he got a masters for it, I can't believe anyone in the CS department looked at his work). I asked about publishing my code after the paper was published and no one seemed bothered.

I've also seen my friends on the mill of, shall we call it paper chasing? That seems most apt. It reminded me of borrowing someone's homework and making minor changes to get it past teacher - except the homework was the same paper being constantly rejigged for submission to a new journal. Not much truly new work seemed to be going on.


> maybe these things aren't actually as important as we think they are in professional software development?

Maybe for professional professionals they are.

It is funny that Hinsen's insight is so mundane and obvious: "the code itself is merely a means to this end".


I couldn't agree more. We're in the middle of a reproducibility crisis. It's extremely important that researchers start putting more effort into quality research instead of quantity. "Publish or Perish" is killing academia.


> These include the lack of testability, debuggability, reproducibility, separation of concerns, documentation, or usability

Have you actually checked to what extent the Ferguson codebase is guilty of these things? It seems to me there are people spreading misinformation, and too many people taking their word for it.


Academics are not software people, by and large. The code is custom built for a specific graph or statistical run. Typically the people writing the software are grad students with little to no interest in code; they just want the answer. Their interest is in the thing they are studying, not code. They may not even have coded before grad school, nor know matrix algebra or calculus. Excel is a big step for them.

I have experience in this issue. A fellow grad student of mine asked for some help with his code. I said sure, but it'll cost you a 12 pack of beer. After the 11th nested 'if' statement in Matlab, I upped it to a 24 pack. We never did get it working right.


> If you compare the free support academic software developers receive from the rest of their community to other engineering disciplines, it's beyond great.

Could you elaborate on what you mean by that?


Not the OP, but I know of at least one University that runs "Programming for scientists" courses at low or no cost.


These courses are usually short introductions into python or Java. Most scientists simply don't have time in their curriculum for a full software engineering course.


I didn't claim they were good. My father is doing a Python one right now and I'm not very impressed. He is retired from the University but is still eligible to take the course.


Oh we teach programming alright, but software development is another pair of shoes entirely. Unfortunately, that's where the problem lies...


That's the whole problem with the "Learn to code" movement, tbh. Plus Python and Java are terrible languages when it comes to those broader swdev/softeng concerns.


What is better than Java?


Science is inherently a low stakes game. Most scientists are competing for budgets of no more than a few million a year, if they are really successful. There is little reward for producing excellent code, but a lot of reward for producing atrocious code to publish in prestigious journals.

If you want to be truly horrified look into the practices surrounding NEURON, a simulation tool that has been used to publish simulations since the 80s. You have to write code in a domain specific language called Hoc, that code is typically littered with hard coded constants. Code is copied over from one paper to the next, etc. Code is a means to an end, not something that is held in high regard. Moreover improving someone else's code style or quality won't earn you a degree or praise. Starting a competing "higher quality" project is bound to annoy or anger the establish players. It will be hard to get funding for it, because those players will likely review your proposal. In any case there is typically a 10-20 year gap between the people doing any decisions and the people actually doing any practical work.

Finally in some cases code is a competitive advantage, I'm sure that Neil Ferguson has churned out quite a few papers with the same framework by simply tweaking some things here and there. In that case you are ill advised to share the code with anyone.


I'm somewhat skeptical. Firstly, I don't think the language is the problem with scientific code. You can write messy code in any language. So the warning then has to be about writing software in general. In that case, I think a warning like "don't try to write software unless you have years of training" is a bit much. Many people with no training learn to write nice code. Many projects made by amateurs might have ugly code but still add something to the world (eg. many games).

The problem here is the project is influencing decisions in healthcare.

Having worked in HPC and academia, I've seen code like this a lot. There are two archetypes I've noticed: (1) the well-meaning older academic maintaining legacy code, who have often done a lot of convergence testing, but still have code that isn't up to modern engineering practices, and (2) the domain experts with the attitude that "programming is much easier than my area of domain expertise". These are problems that require attitude changes within academia, not better warnings on online tutorials. The second group are going to ignore the warnings anyway.

Remember many of the people writing this academic code also teach programming courses in their departments! They view themselves as programming experts.


I'd like to point out I've met a lot of software engineers that subscribe to two in reverse. "This domain area of expertise is much easier than programming, ergo I am qualified to solve it."


I think it's true of many experts, that they see their own area of expertise as the most important one, and the others as relatively minor.


This model didn't just influence decisions in healthcare. It single-handedly changed the UK government's strategy over this pandemic.

From what I understand the UK was planning on beating COVID by creating herd immunity, similarly to Sweden. Then this model came out and everyone started yelling that Boris wanted to kill your grandma.

The problem is that it's impossible to have an intelligent discussion over this. This pandemic became a partisan issue. We're not discussing whether one of the most impactful decisions made by a government this generation should be based over absolute trash code. You're either uncritical of the lockdown or "anti-science".


> creating herd immunity, similarly to Sweden. > ... > The problem is that it's impossible to have an intelligent discussion over this.

As far as I can tell the Swedish government never had this plan. It was mentioned in an interview and dismissed as unworkable, journalists misunderstood.

On the other hand the UK government appears to have had no plans whatever until jolted into action by the fear that public opinion would turn against them.

What Sweden has done is similar to Norway, where I live, which relies largely on voluntary changes in behaviour and temporary closure of institutions and businesses that require close contact between employees and customers. But Sweden took longer to implement those measures and also Swedish society is different from Norway, anecdotally Swedes seem to me to be more urban people than Norwegians and more gregarious.

Exactly why Sweden has a much higher death rate, 36/100k inhabitants versus 4.3/100k in Norway, is unclear at the moment partly because of different definitions but also because of differing conditions, and the epidemic being at different stages in the two countries.


It seems the government was following this document at the start: https://assets.publishing.service.gov.uk/government/uploads/...

The reason it seemed they were doing nothing are these passages:

ii. Minimise the potential impact of a pandemic on society and the economy by:

• Supporting the continuity of essential services, including the supply of medicines, and protecting critical national infrastructure as far as possible.

• Supporting the continuation of everyday activities as far as practicable.

• Upholding the rule of law and the democratic process.

• Preparing to cope with the possibility of significant numbers of additional deaths.

• Promoting a return to normality and the restoration of disrupted services at the earliest opportunity.

There's way more, but I've honestly not read it all. But there was a plan, drafted before this epidemic.

Public opinion was turning against the government, but it actually kept course for some time. Something I was honestly impressed with. What made it drop the plan was Neil Ferguson's study.

There are many reasons for criticising the plan. This article is pretty good. https://www.theguardian.com/politics/2020/mar/29/uk-strategy...

What really gets me is that if the lockdown was the correct decision, we arrived there for the wrong reasons.

This paper had such an outsized impact that it should be held to a higher standard. And it's scary (but not really unexpected) that the government is making decisions of this magnitude based on such a shaky foundation.


> beating COVID by creating herd immunity, similarly to Sweden

The big problem here is that herd immunity requires that either you have a vaccine or you get some large fraction of the population infected, over 50%.

The death rate is about 1%, plus further people suffering long-term complications.

So achieving herd immunity in the UK would require about 300,000 dead.


Yeah, requiring years of training to write anything is nonsense. Everybody should learn to write code, and there's tons of interesting stuff you can do without knowing software engineering best-practices. Not every scientific model has to scale to industrial scale or be maintainable by many people over many years.

My problem is entirely with the article that blames the tool they chose and the software engineering community that didn't put big warning stickers on that tool.


John Carmack has reviewed the code and didn't seem to find it all that bad, so there's that.

https://twitter.com/ID_AA_Carmack/status/1258192134752145412

https://twitter.com/ID_AA_Carmack/status/1244302925855326209


Really?

> Imperial are trying to have their cake and eat it. Reports of random results are dismissed with responses like “that’s not a problem, just run it a lot of times and take the average”, but at the same time, they’re fixing such bugs when they find them. They know their code can’t withstand scrutiny, so they hid it until professionals had a chance to fix it, but the damage from over a decade of amateur hobby programming is so extensive that even Microsoft were unable to make it run right.

That's from the first analysis[1]. There's a follow up[2]:

> Sadly it shows that Imperial have been making some false statements.

and

> It’s clear that the changes made over the past month and a half have radically altered the predictions of the model. It will probably never be possible to replicate the numbers in Report 9.

[1] https://lockdownsceptics.org/code-review-of-fergusons-model/

[2] https://lockdownsceptics.org/second-analysis-of-fergusons-mo...


Yes. John Carmack says the code is OK, some anonymous person on the lockdownsceptics.org website says it's a flaming heap of garbage.

For what it's worth I looked into some of the tickets linked in those articles and concluded the author is, broadly speaking, full of shit. I am nobody in particular though.


> John Carmack says the code is OK

He doesn't say it's okay, he engages in a weird kind of whataboutery like "Heck, professional software engineering struggles mightily with just making completely reproducable builds". I struggle to note one part of the article by the "retired software engineer" (as if that has any relevance either) that he deals with specifically.

But since it's John Carmack we must let him wave his hand and say it is so. The Github issues are also far more enlightening than Carmack's tweets on this, but again, who cares for precision and points argued with evidence when we have a name giving their opinion?


> The Github issues are also far more enlightening than Carmack's tweets on this

Totally agree. The "lockdown skeptics" articles significantly misrepresent the github issues. They imply that there are mysterious uncertainties creeping into the results, when the actual issues relate to things like failures to set RNG seeds consistently, or a checksum failing in a test due to floating point rounding differences in Cray supercomputers' native instructions. Most readers aren't going to investigate the github issues though.


That's a valuable pointer, thanks for sharing :-)


This letter feels as though it is overlooking a large point of contention.

>The scientists who wrote this horrible code most probably had no training in software engineering, and no funding to hire software engineers

Shouldn't the argument be, that for research that is reliant on coding models, funding be allocated to experts that can assist in creating said models (software engineers)?


The conclusions from the first critical code review cited:

All papers based on this code should be retracted immediately. Imperial’s modelling efforts should be reset with a new team that isn’t under Professor Ferguson, and which has a commitment to replicable results with published code from day one.

On a personal level, I’d go further and suggest that all academic epidemiology be defunded. This sort of work is best done by the insurance sector. Insurers employ modellers and data scientists, but also employ managers whose job is to decide whether a model is accurate enough for real world usage and professional software engineers to ensure model software is properly tested, understandable and so on. Academic efforts don’t have these people, and the results speak for themselves.

https://lockdownsceptics.org/code-review-of-fergusons-model/


That's the most biased opinion imaginable by an astroturf group. There's no information on who they are, but the article currently on their front page is by Toby Young, who is a Spectator/Quillete guy and to be found backing almost all stupid ideas within British politics. https://www.spectator.co.uk/article/This-lockdown-may-kill-m...


Yeah really, why don't we leave out all modelling duties to companies who optimize for more money instead of leaving it with the only actor trying to optimize towards actual public health?

What could go wrong?


If you're willing to dismiss all companies as optimizing for more money, it seems only fair to say that academics optimize for prestige and publication in good journals.


That's also true. Now the million $ question:

Whom would you like our society to rely on to generate quality work?

I don't think there is a satisfactory answer to this question. Public research becomes more and more of an industry every year with the publish-or-perish game, while a solely private solution is obviously open to very biased conclusions.

There is no smart solution to a stupid problem. But the truth is that _as an institution_ the NHS is the only actor whose mission is to optimize towards public health.


In most fields, our society relies on private industry to generate quality work, even when the work is very important and doing it wrong might kill people. I'm not an anarchist, I do recognize there are reasons that the government should provide some things. But the idea that private industry uniformly produces bad results because they don't care about anything but profit just seems silly to me. Producing good results is profitable!


> In most fields, our society relies on private industry to generate quality work ...

This is just not true. Military, police, courts? What does "most fields" even mean?


My job, food, apartment, utilities, entertainment, all come from private companies.


> it seems only fair to say that academics optimize for prestige and publication in good journals.

On one hand you have lots of people arguing that the legal duty of a company is only to make money for its shareholders. When large companies fail at that goal, it's bailout time.

On the other hand you have peer-reviewed journals where authors are incentivized to find accurate results, and researchers will cite articles on the basis of their veracity (or, if incorrect, as punching bags). Of course that's a fallible process and just as vulnerable to cronyism, but when researchers are caught cooking the books they're discredited, not rewarded.


Authors aren't actually incentivized to find accurate results, but rather publishable results, which typically means novel. Researchers also cite articles based on their impact, not their veracity. There are plenty of instances of retracted results continuing to be cited as if they are still accurate.

There are issues with both industrial and academic research, but I do think that industrial research is more transparent in its motivations.


Everyone working for a living optimizes their work towards making more money. That doesn't change for people funded with public money. It's pretty widely accepted that the people in charge of public funding (politicians) sometimes act outside of the publics interest for self-gain.

I make no claim as to which achieves better results for the public because it's such a complicated problem, but I think it's rather naive to just assume publicly funded incentives are more aligned with social health than private incentives.


That last part is mindblowing given the fact that the health insurance industry in the US is trying to argue it shouldn’t have to pay for COVID-19 treatments because it’s part of a pandemic and not part of normal medical treatments.

I guess if your model is:

   if pandemic and COVID:
       is_covered = False
You can have a very clean pandemic model.


That was a shitty code review. Seeding issues like the ones cited don't affect the results of a Monte Carlo simulation, and there are tests in the repo, just not automated ones.

The section you quoted shows the reason for the review's sloppiness. The reviewer set out to find a way to justify their own beliefs instead of to actually read the code.


> That was a shitty code review

Damn straight. The only sense in which it was not a shitty code review, is that it didn't actually review any code. Looking into the linked tickets is a time consuming faff, but people should actually do it before taking the blog post at face value.


I agree. Moreover, the cited bug ("predictions varied by around 80,000 deaths after 80 days") doesn't really seem to impact the over all policy implications: no lockdown means exceptional number (400 to 480k) of deaths.

Unless the entire simulation is bogus, it comes off as nitpicking.


Seriously?

Doing a Monte Carlo simulation means you adjust the seeds to get different runs. It doesn't mean your program can read uninitialised memory or reuse variables that weren't reset to zero and still be correct.

Where are people getting this idea that you can just average away the results of out-of-bounds reads and race conditions?


What does reading uninitialized memory or reusing variables that weren't set to zero have to do with seeding issues? Read my comment again and reply to its content instead of making up a comment that you would like to reply to.


Yeah that's a weird set of conclusions there. I would think the papers based on it want another look though.


Back in 2008-2009 my then PI tried this: it worked for a while, but then it proved to be untenable. Not because there wasn't enough money (there was) but because the university did not like the idea to get someone on board just to help researchers develop software, and thus there were so many roadblocks at one point that it was impossible to go on.


I'm interested in hearing more details on this, if you're able to discuss them.


I don't honestly know how much I can disclose (FTR, I was not the one doing that job - I was working as a postdoc there at the time), but I'll try to summarize it briefly:

- PI got a couple of EU funded grants

- PI wanted to build a program / programs out of some ideas he had before moving to the institution he was currently employed at

- PI hired a software developer (actually a software engineer) to do this job

- Statistician drafted algorithms, developer created the software

- I used the software for my Ph.D. thesis, got hired at the lab of the PI as a postdoc, new requirements arose due to the way I used it

- Developer made a preliminary version of a new version of the software following discussions with me

- University made it very hard to keep developer on the team, due to bureaucracy and kind of hostility towards this kind of employment

- At some point (I can't recall the details exactly, but it was something that spanned almost one year), the form of keeping the developer on board was no longer possible

- PI offers an alternative contract, but it is financially wasteful to the developer (not the fault of the PI, but the way certain things work in my country)

- Developer leaves the project

Also the university, to my knowledge, complained that the developer cost a lot (IIRC, the project was paid at market rate, so in line with other, non academic software projects).

I can't comment on the quality of the software we used (sadly it was never open sourced) as it was in Java and I only have a passing understanding of the language, but the approach of having a dedicated developer IMO worked (and also net quite a number of publications over that period).


For this to happen there would need to be a nationally accepted PE certification for 'software engineers' on par with other engineering disciplines. It's unreasonable to expect non-experts, and experts from other fields, to perform credentialing on a case by case basis.


There is no "nationally accepted" PE license for any discipline. It's a state-level credential and reciprocity is not universal or automatic.


I included the "nationally accepted" bit since I'd prefer to see some incremental improvement over current models, but I don't have confidence that an international standard would be accepted in all relevant contexts. It could still be administered at the state level.

I'm also ignoring the requirement that candidates must start with a degree from an ABET-accredited institution, which seems to feature prominently in proposals from IEEE and others. Ideally I think there should be some way around that, but alternatives I'm familiar with aren't great either (e.g. FINRA).


> experts

Being an expert means you know your tools well. If you can't code well and coding is a critical part of your toolset, then you're not an expert yet. You don't get a free pass because you allegedly are strong in other parts of your craft, especially if the part of your craft you're weak in can be this problematic.


I disagree so strongly with this that I had a visceral reaction while reading it.

C++ is a tool, not an end product. If you're not qualified to use a tool correctly it's not the manufacturer's fault, it's yours.

Why do so many people believe that good software development is not part of their job? If you write code then you're a developer, no matter your job title. If you write shit software, saying you're a researcher is not an excuse.


This is such a ridiculous comment I don't know where to start.

Are you actually proposing that being a fully experienced & knowledgeable software engineer should be a base requirement for all academic research (in any field)?

If you, working as a software engineer, were told tomorrow by your manager that you needed to perform heart surgery, and that that now fell under your responsibilities in your current role, would your argument be the same?

The private sector's typical response to the problems in the article would be to hire qualified software engineers to assist researchers. Funding dictates this is impossible, so they make do.


> Funding dictates this is impossible, so they make do.

Put another way: software quality is not valued in academia. Let’s say the code in question was in great shape. Would it have mattered at all in the trajectory of anyone’s career? No. This is probably the one academic code base in a million that has received any negative reputational hit due to its quality.


> software quality is not valued in academia.

And in general, it shouldn't. Code written by scientists is almost always throwaway prototypes. So it absolutely doesn't matter how decoupled, or easy to extend it is. The only aspect of quality that matters here is the simplicity, in the sense "it obviously has no bugs" (and related practices like testing). This is something that could be taught, but I think it should be also addressed from the other end of the spectrum - more pressure put on verification and reproducibility of results.


It doesn't have to be decoupled or easy to extend, but it should be readable enough to ensure it does what it's supposed to, and well-tested enough to verify it does what it's supposed to.


I think Blaise Pascal's (or whoever really said it) remark is apposite here:

    If I Had More Time, I Would Have Written a Shorter Letter
https://quoteinvestigator.com/2012/04/28/shorter-letter/


There's plenty of funding in academia. Come on. It's a huge segment of society, governments spend billions on it.

They don't hire software engineers to write their models because they know they can get published without that and they'd rather hire more students into their department and publish more papers. For academia to claim they can't afford to produce programs that work is absurd and damning - why should anyone believe anything in a scientific paper if this attitude is so widespread? Some people here are arguing it's ridiculous to expect academics to produce work that's correct, even putting assumptions to one side.

All it takes to fix this is multi-disciplinary teams. Of the type found in almost any industry but apparently, not academia. Why is this idea so strongly rejected by academia when it's just common sense everywhere else?


If you write software for your research then yes.

Just as I would expect you to have a solid basis in statistics, you need more than a 101-level competency in programming / software engineering if that's what you use to do your job. Same goes no matter what tools you're using, you need something beyond basic competence or advice from someone who does.

Nothing that I do day-to-day is remotely like surgery or in any way healthcare related, so that's a stupid straw man. Asking the same question of the academic _studying healthcare_ would be more apropos, but equally stupid and nonsensical.


Where in my comment did you see me proposing that "being a fully experienced & knowledgeable software engineer should be a base requirement for all academic research (in any field)"

Go back and re-read it please. I'm saying you need to be capable of using the tools for your job.

If I was working as a software engineer, I'd not be asked to perform heart surgery. A more realistic example would be a software engineer being tasked to develop software in the medical industry, where I would expect them to understand the subject matter enough to ascertain their software is not crap.


Sorry but there's a pretty big gap between understanding the domain for which your software is intended, and being qualified to work as a professional in that domain. The latter is what you're asking of academics.


I guess we are in agreement then ;)

If academics are writing predictive modelling software, I ask that they are qualified to do so.


> Where in my comment did you see me proposing that "being a fully experienced & knowledgeable software engineer should be a base requirement for all academic research (in any field)"

> I ask that they are qualified to do so.

You need to make up your mind on this one. Are you asking this or aren't you?


Being -- capable of writing quality software -- should be a base requirement for -- Anyone developing software used for academic research -- in any field -- in which the conclusions are based upon the correctness of this software.

Not all academic research depends on writing software.


> "The private sector's typical response to the problems in the article would be to hire qualified software engineers to assist researchers. Funding dictates this is impossible, so they make do."

But that is a bizarre conclusion. Is it really cheaper to let scientists focus on something they know nothing about, than to hire an expert to do it? Do you cut costs by letting scientists mop the floor, clean the toilet and guard the entrance?


> Is it really cheaper to let scientists focus on something they know nothing about, than to hire an expert to do it? Do you cut costs by letting scientists mop the floor, clean the toilet and guard the entrance?

Believe it or not, mean software engineer salaries are marginally higher than those of janitors and door security.


But easily worth it if it saves the scientist a lot of time and makes the implementation of their model more accurate and reliable.


Well, even if, the tool doesn't come with a warning. Real-life tools come with them.

(The question is, what kind of warning would be useful on a programming language? C/C++ perhaps should come with a booklet about pointers, but other than that, I can't think of any but one: "WARNING: this is a tool for extreme clarification of thought; if you're confused about your subject matter, your project will produce wrong results. As you write code, take steps to verify your understanding of your problem domain.")

More importantly, our industry is spending a lot of time and marketing money on convincing that everyone can code, that it's just a matter of picking up a JavaScript tutorial and pressing F12 in the browser, and you too are now a software developer. These two views are in direct opposition to each other.


Some tools are inherently safer than others. The whole difference between C++ and Rust is that it's a bit harder to shoot yourself in the foot while using Rust, the way you can with C++. It's not that you can't do it, but generally it involves purposely bypassing its safety mechanisms. You're still supposed to know what you're doing, but you know where you need to pay attention.


Good software development is missing from most academic promotion tracks. "Research Software Engineers" are a recently new thing, which is an admission that academics are not very good at software engineering and supports the notion that they can't reasonably be expected to be experts in two disciplines.

There is a large amount of software being written by non-software engineers (most of it is Excel) so I think it's probably constructive to say that the solution lies elsewhere other than expecting academics to become software engineers. If that sort of requirement is in place it would be hard to argue against software engineering being a chartered engineering discipline with liabilities and qualifications which would be very disruptive to a lot of the industry.

It is worth remembering that academic code sharing is rare in and of itself. This reaction is likely to cause other academics to shy away from publishing their code. Perhaps giving them an opportunity to make their mistakes public so that more skilled engineers could help before the credibility of their results is used to justify drastic government policies would be better?


> If you're not qualified to use a tool correctly it's not the manufacturer's fault, it's yours.

It certainly _can_ be the manufacturer's fault, if the manufacturer's salespeople lied to you about the learning required to master the tool.

I think the point of the article is that collectively, as a community of experts, we have been misguiding non-experts into thinking that C++ is a good tool for their job.

A non-expert cannot make an informed decision on which tool to use, they _have_ to rely on advice from experts to pick their first tool, so if they feel like we're recommending C++ (and I agree with the author that this is not a good tool for non-experts), then that's, in a sense, on us.

(That being said I did not inspect the code, so I don't know if the author's implication is right that C++ is the core problem of the code base)


this is quite off the mark. the author sets the bar too low for himself by criticizing the most easily (and to be fair legitimately) dismissed criticisms of the Imperial College model.

here's a better laid-out critique that the OP doesn't speak to:

The Imperial College modelers released the source code a couple of days ago to the model that shut down the world economy. It's not the original model code but was rather original source code turned over to volunteer programmers who re-wrote it so that is more readable. I have done some model review of financial models in the past but without the source code I would not be able to do a full review of the Imperial College model. Now that we have the source code (sort of), I can.

Any such model ought to have been independently reviewed before it is ever used for real policy decisions. Policy analysis is awash in models but no one ever really checks them. Going forward, health policy makers should ask for and disclose independent validation of any model before using its results to make recommendations of any consequence.

Normally, model reviews are long technical documents but there would also be a summary section. Here's what I think a summary should have looked like.

Overall conclusion: this model cannot be relied on to guide coronavirus policy. Even if the documentation, coding, and testing problems were fixed, the model logic is fatally flawed, which is evidenced by its poor forecasting performance.

https://www.facebook.com/scarlett.strong.1/posts/25243721950...


> Any such model ought to have been independently reviewed before it is ever used for real policy decisions. Policy analysis is awash in models but no one ever really checks them. Going forward, health policy makers should ask for and disclose independent validation of any model before using its results to make recommendations of any consequence.

That's ignoring the time-limited nature of a virus response. "No decision" is itself a decision. Delay has a real cost, and using a potentially imperfect model is simply using the best information at hand.

I think everyone would agree that the ideal is to have well-documented, thoroughly-tested, easy-to-use, and bulletproof models to inform public response to every emergency. However, those models can't be built instantly, and that kind of bulletproofing is not directly relevant to the day-to-day research work aided by such models.

Adding research capacity to ensure that governments have a stable of well-researched, thoroughly-vetted models for emergencies would be a great thing, but it would also be quite expensive. Keeping any single model up to spec might be the job of 1FTE (so an additional ~$150k/grant/yr -- large for a research budget but small for a government), but that would have to be multiplied by every area where the government might possibly want research-informed decisionmaking on short notice.

> Even if the documentation, coding, and testing problems were fixed, the model logic is fatally flawed, which is evidenced by its poor forecasting performance.

That sounds like a great scientific criticism! Model validation is the cornerstone of research on computer models of things, and finding a poor forecast opens the door to many great research questions.

But without that further research, "poor performance" sounds a (loud) note of caution, but isn't necessarily fatal. The leading-order problem is to understand why the model performed poorly: was it improperly calibrated with information known at the time (such as if the virus behaves differently than assumed)? Was there some out-of-sample feature of the forecast that the model would not expect to do well on (e.g., low death rates in an open society because everything was shut down for severe weather anyway?) Is an overall trend correct but the timing in error?

"Flawed," I think, can be easily shown, and this should probably be expected in a research model. "Fatally flawed," however, is a stronger claim that must pass a greater burden of proof.


The quickest decision is random number generation. Should people go with that? Governments across the world had loads and loads of time.

> Adding research capacity to ensure that governments have a stable of well-researched, thoroughly-vetted models for emergencies would be a great thing, but it would also be quite expensive. Keeping any single model up to spec might be the job of 1FTE (so an additional ~$150k/grant/yr -- large for a research budget but small for a government), but that would have to be multiplied by every area where the government might possibly want research-informed decisionmaking on short notice.

If you have a once in a century crisis, and still are thinking about saving some millions, while being sure your crisis response will be above billions, your policy is already flawed and not much can be done to help you.


>If you have a once in a century crisis, and still are thinking about saving some millions, while being sure your crisis response will be above billions, your policy is already flawed and not much can be done to help you.

By the time you're in the crisis, it's too late to make software bulletproof. Not only does that process take time, but it also needs to be integrated with the whole of the research effort up to that point.

So you can't pick and choose topics: you need the funding to make bulletproof every potential policy-related model you might need in an emergency. That's where the costs add up, since we don't have a time machine to go back and fund exactly the lines of research we in fact need at the moment.


But nobody is asking for bullet proof software here. Nobody is asking perfect MC/DC coverage.

If governments are so incompetent that they can't forsee a crisis like this even one month in advance, where even sanitizer hoarders have better foresight, then off course nothing can really help. Governments do spend billions and trillions of money on national defence, and an issue like this when it's almost at the door should be given the zame priority as national defence.


It's worth noting that in a pandemic situation, waiting before taking action is "making a decision"

Shutting down the economy is costly, yes, but allowing a virus to continue its exponential growth with all of the unknowns that entails is much more costly in expectation (we can only speak in expectation given the unknowns)

Whether or not this model should have been the basis for such a decision I don't know, but it's important to keep in mind that taking proactive steps in a pandemic is conservative


The government wasn't waiting before taking action. The Imperial report was impactful because it said that the government's actions were insufficient, that if they continued on with only moderate social distancing people would die in the streets for lack of hospital beds.


> the model logic is fatally flawed, which is evidenced by its poor forecasting performance.

Where's the proof the model is flawed? The criticism goes both ways, if on one side the model does not have documentation, it's also fair to say that if someone claims it's "fatally flawed" they need to point out in the code where the issues are.


> Consider what you, as a client, expect from engineers in other domains. You expect cars to be safe to use by anyone with a driver’s license. You expect household appliances to be safe to use for anyone after a cursory glance at the instruction manuals. It is reasonable then to expect your clients to become proficient in your work just to be able to use your products responsibly? Worse, is it reasonable to make that expectation tacitly?

I don’t buy this argument. Cars are safe, yes. C++ isn’t the car, though. C++ is the dangerous machine shop you use to build the car. A better analogy would be comparing a car to a web browser, and indeed incredible effort has been put into keeping web browsers secure.


Well, it looks like one of the biggest problems in computational biology has finally blown up in our face.

I work in a very similar field to the one discussed here - ecosystem modelling. And much of the code I see is probably of a similar quality to Neil Ferguson's model. (Although I haven't had a detailed look at his work.) For all you angry devs out there, I have three comments:

1. Yes, we have a problem with code quality, and no, this should not be acceptable where policy-relevant decisions of such magnitude are concerned. (Although one should bear in mind that all science is flawed and its achieved reliability will always be limited by constraints of time/budget/etc.)

2. However, changing the status quo is really hard. Specifically, the two greatest changes that are needed are to teach proper software development practices to students in the natural sciences (just like we teach lab technique), and/or make it easier to get funding for software developer positions in a research team. This sounds easy, and ought to be easy, but both run counter to some pretty entrenched ideas held by the "old folks at the top" in universities and funding bodies. (Believe me, I'm speaking from experience :-/ )

3. But yes, we are working on it. The debate is growing, and there's a new generation of computational biologists with much closer ties to computer science who are trying to shake things up a bit. In some sense, we are several decades behind the wider CS world in the techniques we use, but we're catching up. And at the same time, we're starting to develop some of our own methods of quality control (such as pattern-oriented modelling).


If you're interested, here is some further reading:

* DeAngelis, D. L., & Grimm, V. (2014). Individual-based models in ecology after four decades. F1000prime Reports, 6(June), 39. https://doi.org/10.12703/P6-39

* Grimm, V., Berger, U., DeAngelis, D. L., Polhill, J. G., Giske, J., & Railsback, S. F. (2010). The ODD protocol: A review and first update. Ecological Modelling, 221, 2760–2768. https://doi.org/10.1016/j.ecolmodel.2010.08.019

* Grimm, V., & Railsback, S. F. (2011). Pattern-oriented modelling: A ‘multi-scope’ for predictive systems ecology. Philosophical Transactions of the Royal Society B, 367(1586), 298–310. https://doi.org/10.1098/rstb.2011.0180

* Nowogrodzki, A. (2019). How to support open-source software and stay sane. Nature, 571(7763), 133–134. https://doi.org/10.1038/d41586-019-02046-0

* Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2016). Good Enough Practices in Scientific Computing. 1–30. http://arxiv.org/abs/1609.00037


I chuckled when I saw the author of this article. Many years ago I was a scientist (PhD student) writing code in Python, but wanted to use features in Mathematica like symbolic integration. Turns out Mathematica has a C API that lets you send Math expressions to it and evaluate them. I wrote a clunky Python interface and shared it with Konrad Hinsen, who looked at it and suggested an elegant recursive object representation of Math objects in Python that led to automatic conversion between Python and Mathematica, massively simplifying the code and making it more elegant. I got slightly better at software engineering that day.


I'm trying to be charitable on this code base issue, and the institutions will need to accept they need to be able to bear more scrutiny on the inputs to policy recommendations.

Both R and Python are taught in highschool and undergraduate courses for scientific data analysis. Ferguson and other scientists did not need software engineers. A free co-op student, or to have spent the three weeks learning a language that met the needs of the level of abstraction he was working at would have sufficed.

The culture that enables that code as described to be acceptable for policymaking is one that intentionally produces complex black boxes that obfuscate risk and attribution, and to launder decision accountability through technology. I've seen this in other institutional code as well.

However, while the policy recommendations that resulted from his model may (or may not) have saved tens of thousands of lives, it did so at the risk of losing the credibility to do it again.

Deflecting blame to nebulous software engineers is disingenuous and serves mainly to exacerbate the suspicions of reasonable people, and further polarize those most harmed by the policy response.


I honestly don't think R and Python would have worked here.

While I was able to clone the repo and run it for Ireland, the UK requires at least 26Gb of RAM, which is not common on most personal computers. The US requires much, much more.

And given that it's pretty slow when written in C++, imagine how slow it would have been in R or Python?

I agree with you in principle with respect to this stuff being better, but the incentives skew otherwise at the moment.


Given we're talking about something that it seems people say does not have good code quality, I would guess that whether the C++ implementation is more efficient due to language choice would be a wide open question.


Any serious attempt at modelling this over python would use the pydata stack (numpy, pandas, etc), which run on top of C++ anyways.


Yeah of course, apologies if that wasn't clear.

The best solution here would probably be to package up the core routines into a library and use this from either R or Python.


Sorry if I came out a bit snippy out there. But yeah I assumed you meant python without numpy, etc.

A lot of the criticism I saw was because the core routines did not need to be packaged up. There were a lot of common data structures reimplemented, etc.

I don't think the model had many novel routines. It could be built just using industry standard and tested tools in python, R, Julia (if you really want speed) etc. But it reinvented the whole ecosystem in one big ball of C.

tbh, this should have been built on STAN or similar. There's so many variables and assumptions that the output is completely dominated by the parameters chosen. Seeing the distribution of outcomes instead of a point estimate would be actually useful.


True, I think that's worth noting. To be fair though, this codebase is pretty old, and it's unlikely that the technology landscape looked much like today (especially in terms of R and Python), so I can see how they ended up here.

I'd love if this was written in R, Python or Stan so I could contribute, but that's probably not the researchers focus ;)

While Stan is amazing, I shudder to think as to how long this model would have taken to run using MCMC (1 week plus maybe?).


Python is unsuitable for writing Monte-Carlo simulations due to its abysmal performance and built-in unfixable inability for parallelism. In addition, it is not a good language in general.

Complex simulations should be written in multiple-dispatch high-perfomance languages like Fortran or Julia.


Whether the Imperial code is good or bad doesn't actually matter: you can derive the headline numbers analytically, without any simulation at all [1].

(The real problem with the worst case estimate is that it assumes people don't individually change their behavior in the face of a pandemic.)

A better critique of the software engineering criticism is [2].

[1] https://twitter.com/trvrb/status/1258879531022082049

[2] https://philbull.wordpress.com/2020/05/10/why-you-can-ignore...


they need to defer to the experts of software development.

I, as a software engineer, wouldn't try to design an epidemic model, i would defer to experts. Epidemic experts shouldn't be writing the implementation of their model, they should defer to the experts.

Also, "It’s you, the software engineering community, that is responsible for tools like C++ that look as if they were designed for shooting yourself in the foot. It’s also you, the software engineering community, that has made no effort to warn the non-expert public of the dangers of these tools"

that's a bit much. if someone buys a lathe, gets it home, flips the switch and tears off an arm are machinists to blame?


> Epidemic experts shouldn't be writing the implementation of their model, they should defer to the experts.

Believe me, a lot of scientists would love to. But as pointed out in the OP: it is next to impossible to get funding for a paid software developer position in a research team. It's a problem the scientific community is increasingly aware of, but changing funding guidelines takes a long time. (Source: computational biologist)


That's a fair point. Also, passing judgement on source code is a very easy thing to do. May he who writes/maintains perfect code cast the first stone.


Most scientific software isn't of subtly poor quality. Most scientific software is a stinking, flaming dumpster fire that looks like it was written by packs of drunk kindergartners.


The lathe is almost guaranteed to come with a long list of warnings and an entire section of the manual devoted to safe use. Learn C++ resources tend to not come with those sections in the same detail nor are they usually front and center.


Absolute cringe, this was hard to read. Seems like a blatant shift of responsibility. At what point do people have to take responsibility and stand by their creations, regardless of what has been spoon-fed to them?


I just took a quick skim through the repository and I'm not quite sure what everyone is so upset about. It looks like simulation code.

I would argue that the real problem isn't C++ but tooling that is aimed at producing source code as an artifact as opposed to repeatable executions as artifacts. There are effectively lots of models in this system, and the code represents all of them tangled together. But that means that you have source code, parameters, tracing, and output as a thing.


> It’s you, the software engineering community, that is responsible for tools like C++ that look as if they were designed for shooting yourself in the foot.

wait what. a fraction of the community might be a pretentious bunch of purist, but I don't think it's fair to criticize them for the tooling selection

one wouldn't pick a excavator just because it's the industry standard for moving earth to plant some tulip in a vase

and even if funding is scarce, google is free: the first result for planting tulips returns a bulb planter, not a bulldozer.


What we're missing is an "open letter" of scientists criticizing software engineers evaluation of this model.

Because it seems no one of these critics have ever coded anything to do with stochastic forecasting or simulations.

"The code produces different results between runs" well, I would be very worried if it didn't, unless your RNG has a fixed seed, and all randomness in your model derives from it (so, if you use multiple threads, there goes your repeatability - which is fine).

Also, you want multiple runs to be different, so you can establish best/worse case scenarios (also tuning some of the parameters).

Does the model pass basic sniff tests? (for example, no more infected people than the population of a place? does it follows known pandemic curves? Can the parameters be reasonably tuned to fit its behaviour in existing places?) And yes the model was checked against a different model from what I remember.

In essence, if you take Population x (percentage for herd immunity) x IFR you can get a very good estimate of the worse case scenario for deaths. Then you can see if your model goes to that value given the known parameters.

Yes, the code is ugly. Yes you probably made worse code at a point in your career.


I think the reason I find this so objectionable is that he's focused on a tool for doing the job of software engineering, and then blamed software engineers for not making it easy for him.

There's a spot-on comment on there at the moment - this isn't the car that's ready to use by anyone with a license. This is the welding gear we use to put bits of car together. You walked into the machine shop here, used some of our tools and made something that wasn't safe.

If scientists are not software engineers, and do not have the time or the motivation to become software engineers, perhaps they shouldn't play at being software engineers?


> Consider what you, as a client, expect from engineers in other domains. You expect cars to be safe to use by anyone with a driver’s license. You expect household appliances to be safe to use for anyone after a cursory glance at the instruction manuals. It is reasonable then to expect your clients to become proficient in your work just to be able to use your products responsibly? Worse, is it reasonable to make that expectation tacitly?

That doesn't seem like a proper analogy. I certainly would expect a civil engineer to think me an idiot if I tried to do his job for him. And that's how most code written by non software engineers/computer scientists ends up looking.

Software engineers don't produce programming languages, but programs. Languages (and libraries, etc) are our tools, not our end products. Much like an architect's job is not to produce rulers and pencils, but plans.

It just so happens that (more so for computer scientists than for software engineers) we create our own tools.


I get what he’s saying, but the “non-experts” using code to represent their models should also let outside help come when the need arises.

When your model predicts an apocalyptical scenario and government is taking drastic measures based on it, it’s a good time to expose your “non-expert code” to the software engineering communities (and all other associated fields) to take a look.


should also let outside help come when the need arises.

expose your “non-expert code” to the software engineering communities (and all other associated fields) to take a look.

1. Software engineers are expensive. Hiring them to write your code is how you end up needing even more money to do your science, and I think the software engineering world if anything can appreciate prioritizing being scrappy to get more done.

2. Open source is slow and doesn't produce consistent results in the timeframes you need in order to get things implemented in one-offs. In fact, the value you get from others looking at your code is pretty anemic unless other software engineers find your code useful, at which point they have real incentive to help you improve existing functionality instead of reinventing the wheel. I do think code should be published alongside papers as a matter of reproduceability, but I don't think opening up the code beforehand will accomplish much.

Which is all to say that while I agree with you in principle, I don't think your recommendations are practical.

I think this letter has it right: we should make better tools to help non-experts do less foot shooting. C++ is very, very foot shooty, and the usual answer of "well get better at C++" is a non-starter for non-experts.

I think there are other solutions too - software engineers with partial specializations in academic fields, volunteers, etc.


I believe this model is quite old (at least in some form) so there have been opportunities to review it. I confess I haven't looked, but I haven't heard any defects have actually been identified. If the model stood up to peer review I assume the results it produces are at least consistent with the expectations of the people who wrote the mathematical model.

Hopefully this will be a watershed moment that makes it easier to cost a research software engineer onto a grant in the future.


Many defects have been identified. A small collection are linked to from here:

https://lockdownsceptics.org/second-analysis-of-fergusons-mo...


I will personally pay for a professional review of the next model the British government uses to guide lockdowns. Cost problem solved.


While I appreciate the sentiment, we can probably both see why one person offering to solve the problem for one study isn't a solution to the systemic problem.

Private funding poses all sorts of issues. Surmountable issues, but still issues once individuals are paying for reviews at scale.


This is starting to happen through Research Software Engineers, although as it is still a (relatively) new movement within academia.

https://society-rse.org/ https://www.software.ac.uk/


I looked at the job vacancies posted on the Society for Research Software Engineering. Those with salaries listed:

Senior RSE, London: £35,965 to £52,701

Web Application Developer, London: £35,965 to £43,470

Research Software Engineer, Hannover: Salary Scale 13 TV-L (AFAICT this is €41259 to €59545)

I'm the type to be happy to take a pay cut, especially if there are work-life advantages involved. But a 80%-90% cut is asking a bit much. It's not like London's a particularly cheap place to live, either.

Not sure how to fix it.


There is probably no funding for outside help... So they get a project fund a PhD student (or post doc) with that, this guy has to produce new results (fast) and more results, then these guys leave, next guys come..

Generally for a simulation group you would need an experienced programmer maintaining the code base, (helping) the other guys cleaning up there ideas before handover, and this as a permanent position. Experienced c++ also means not cheap. Go to you University admins and mention both shocking words (not cheap and permanent without the magic word 'professor') in one sentence, maybe he will recover from his shock in a few days.


I find this a pretty weak excuse for a high-impact epidemiological model like this. Universities are full of the brightest people you will find anywhere in society, and most universities have faculties and courses in computer science and/or software engineering. How about enforcing multi-disciplinary efforts, having computer science students work with microbiologists, and vice versa? Exchange programs between universities that specialize in one or the other?

University graduates need to do internships and a thesis anyway, so in terms of 'funding' you could argue they are basically 'free' in terms of assigning them to multi-disciplinary research.

I know I would have been delighted if someone at my university had offered me an interesting multi-disciplinary microbiology project when I was studying computer science. Instead we mostly just got boring run-of-the-mill exercises in parroting established computer-science theory...


It wasnt meant as an excuse, just to point out what effectively would have to be done to solve this is the future.

I am a bit sceptical of your suggestion of putting more non permanent staff (even from CS) into the pot.


I don't think the government is taking drastic measures based solely on this code. It's just a way to model facts we already know. We know that viruses spread exponentially, and we know what exponential growth looks like this just lets use model different assumptions to see how they affect that model.


The model is way more granular than that. Have you read it?


Sorry I guess I wasn't clear. My point is that if this software was destroyed a year ago our response to the coronavirus would be substantially the same.

Basically that we are not relying on this model exclusively or even substantially to determine policy.


The scientific method is supposed to be reproducible. Others should be able to follow the steps and arrive at the same outcome. Scientsits working in wetlabs are not publishing papers on what they do in a language only they can understand and expecting others to blindly trust the results and conclusions.


This is definitely happening except the language is hard to follow excel.


Bridges are designed, built, tested, and verified as fit for purpose. Planes are designed, built, tested, and verified as fit for purpose. Why should software models that inform policy be excluded from those requirements? What is the expected cost of failure - it seems that critical risk controls are missing from academia - in the case that they are wrong.

The case in hand - is problematic because it highlights that lack of controls that are present and that should probably be present when building models and simulations to inform policies.

Quality control is important in nearly every other industry - the lack of quality controls in academia appears to be the root cause to me. Rather than particular language/build choices.


With all due respect, Konrad can go f himself. Plenty of performant, non-foot-gun, alternatives to C++ these days. The choice of that language, and any abuses of it fall entirely upon Ferguson and his team.

And for this nugget:

It’s also you, the software engineering community, that has made no effort to warn the non-expert public of the dangers of these tools.

I say he deserves it without lube. Hardly a day goes by on Lobsters or HN where we (including many C++ devs) don’t complain at great length on what a reeking dumpster fire C++ is.


That is a reasonable problem statement and proposed solution, but the problems with software engineering go very deep. That is why so many software projects fail outright. Just as we acknowledge that aviation engineering has different risks and constraints from basic tool building so we all need to understand that at least for now and likely for some time software development will be inherently messy and risky for all who dare it.


As a researcher, if I want to find a software engineer willing to review my code for free (I have no budget for this), how should I find one?

The article says

> We can’t ask software experts for a code review every time we do something important.

but I think there are people who'd be willing to give at least 15 minutes of their time to review scientific software once.


The author makes it sound like a warning label is missing from the C++ tin. Maybe. But what tool should he have used? I haven't seen this code but is there any doubt it would look just as bad in Java or Python, maybe with fewer segfaults? Or FORTRAN.


Well, according to some reviews the model did run into problems specific to amateurish use of c/c++. Specifically, it produced different results when running on CPUs with different number of cores and even single core but different CPUs


I'm genuinely surprised it wasn't an Excel spreadsheet.


A well executed Excel model is much easier to explain to a non-technical person.

To be frank, for all the snobbery towards Excel, it has done a marvelous job at getting millions of people to think in more quantitative ways instead of "business acumen".



I haven't seen the code, but from what I read, it may well have been better to have done this in an Excel spreadsheet.

Of course we all would have been just as horrified at that.


> A clear message saying “Unless you are willing to train for many years to become a software engineer yourself, this tool is not for you.”

If you say anything along those lines, you'll be dismissed as a "gatekeeper".


Could someone give a short example of some ugly code in this repo?

I've quickly looked and I did not find anything clearly uglier than what I have seen in 99% of online repositories...

Quite the contrary, to be honest.


Laughable nonsense, and especially glib when there are lives on the line due to the question of competence/lack of it.

One thing I'm surprised not to have seen yet - as a MSc Geographic Information Science grad - is criticism of the methdology from a GI science aspect. And the question that I have is: where is the proof that this model's results are in any way representative of reality? Surely Professor Ferguson - before releasing results that have the potential to turn economies upside down, and with thousands of lives at stake - would have run this against some real world data to validate the model. Or not? If not, why should we listen to anything he says ever again?


Publicly funded science should in almost all cases make the software underpinning it publicly available. Otherwise, how are results reproducible, verifiable or open to peer review?


Instead of complaining about code quality, shouldn't SE's jump in and support academia in writing better code?


What a collection of terrible arguments. I'm afraid I can't help myself here. I've got to eviscerate this USENET-style:

> "The scientists who wrote this horrible code most probably had no training in software engineering, and no funding to hire software engineers. And the senior or former scientists who decided to give tax-payer money to this research group are probably even more ignorant of the importance of code for science. Otherwise they would surely have attributed money for software development, and verified the application of best practices."

Let's start with the observation that this is indeed the main problem here. Planning research that requires software to be developed, without accounting for software development, is a terrible idea. There's a good reason why scientific programmers exist: to help researchers write the code they need for their scientific projects. Hire one if you need one. Don't blame someone else if you forget to do so.

> "It’s you, the software engineering community, that is responsible for tools like C++ that look as if they were designed for shooting yourself in the foot. It’s also you, the software engineering community, that has made no effort to warn the non-expert public of the dangers of these tools."

Excuse me? The internet is riddled with jokes about how easy it is to shoot yourself in the foot with C++. Of course you can shoot yourself in the foot with any programming language, but C++ excels at it. Many programmers avoid it because they don't want to have to manage their own memory. Follow their example and use something that focuses on the problem area you want to focus on.

> "You know, the kind of warning that every instruction manual for a microwave oven starts with: don’t use this to dry your dog after a bath."

Anyone who needs that kind of warning is a danger to themselves and others. A scientist who lacks this level of common sense should seek guidance from someone who has it.

> "A clear message saying “Unless you are willing to train for many years to become a software engineer yourself, this tool is not for you.”"

Software engineer is a serious profession that requires lots of training. Should any idiot expect to whip up their own epidemic simulation without knowing what they're doing and expect reasonable results?

> "But power comes with responsibility. If you want scientists to construct reliable implementations of models that matter for public health decisions, the best you can do is make good tools for that task, but the very least you must do is put clear warning signs on tools that you do not want scientists to use"

Is it unreasonable to expect someone who needs tools, to either research what tools are suitable for their needs, or otherwise ask advice from an expert on those tools? Nobody grabs just a random tool from their toolbox to solve a specific problem. If it's a nail, you grab a hammer, if it's a screw, you use a screw driver.

> "scientists are not software engineers, and have neither the time nor the motivation to become software engineers."

Then hire one. Don't blame your lack of motivation on others.

> "Consider what you, as a client, expect from engineers in other domains. You expect cars to be safe to use by anyone with a driver’s license."

Yeah, with a driver's license. We don't let random idiots drive off in a car, we expect them to learn how to drive first. If you want to use C++, you're going to have to learn about memory management. If you don't want to do that, get a different language.

> "You expect household appliances to be safe to use for anyone after a cursory glance at the instruction manuals."

Is that cursory glance going to be enough to tell them not to put their dog in a microwave? At least C++ is not going to kill your dog.

> "It is reasonable then to expect your clients to become proficient in your work just to be able to use your products responsibly?"

They can use our products just fine, but if they want to use our tools, they need to learn how to use them.

Should toolboxes now contain warning labels not to build your own car from scratch? Or not to use these tools to repair a nuclear reactor? It's absolutely valuable to learn how to use tools, but if you just grab random power tools without knowing what you're doing, and without being willing to learn to use them, you're likely to lose a limb.

Finally, if you're really looking for a programming language to let scientists play around with, try Python. It's designed to be easy to learn, and it's also very suitable for all sorts of scientific modeling. There's a good reason it's popular in research. I still recommend putting some effort in learning to use it.


C++ is hard, therefore they're allowed to publish such rubbish and destroy trillions of dollars as a consequence? Critical code is written all the time. It can be expected with such decisions.


Everything is fine. You get what you pay for.


Oh not not the heckin scientists. They are always right! Dang it guys it's our fault, not Science. I just wish it was 1801 again so we could parade around streets once more in worship of the god of Rationality!


Look, the software engineering community has switched to Rust by now. If you're still using C++, that's your problem. Don't blame us.


So government can spend billions and trillions on implementing covid19 related policies, but can't spend any money on getting good data for the same? The fact that government policy is driven by some unpublished, undocumented, non repeatable code, affecting the lives of an entire nation, should be treated as a national defence issue. Anyone saying the person writing this was not a software engineer is giving a poor excuse. Government shouldn't appoint random people to do random jobs and in case of poor results say they weren't a trained XYZ. The people deserve a better process.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: