Lessons from COVID-19 modeling: the interplay of data, models and behaviour

May 12, 2020

A notice on the pavement reminding people of social distancing in Tooting, following the outbreak of the coronavirus disease (COVID-19), London, Britain, May 4, 2020. REUTERS/Hannah McKay - RC2NHG9PAZKK

Even when models are "wrong," they can still be useful – especially if they changed behaviour. Image: REUTERS/Hannah McKay - RC2NHG9PAZKK

Kay Firth-Butterfield

Senior Research Fellow, University of Texas at Austin

Anand Rao

Global Leader, Artificial Intelligence, PwC

The whole world is familiar with COVID-19 infection and death models and the "flatten the curve" concept, but it's difficult to know which models we can trust.
Models have dramatically changed the behaviour of health officials, policymakers and citizens.
The behaviour changes have thus reduced the number of deaths, making the estimates in the original models seem inaccurate.

In the history of humanity, perhaps no data models have been more recognizable than COVID-19’s infection and death curves. Virtually everyone, from the farmer in India to the director of the Center for Disease Control and Prevention (CDC) in the US, is now familiar with them.

But the models have also drawn criticisms for inconsistencies. One of the early models developed by the Imperial College London predicted more than 2.2 million deaths in the US, but subsequent models (e.g., University of Washington’s IHME-Institute for Health Metrics and Evaluation) projected a significantly lower number of deaths. Which models can we trust?

Have you read?

As COVID-19 continues to race around the world at a frightening and ferocious pace – often spreading misinformation along with it – governments, organizations and people are clamoring for the truth about safeguards, how we can fight back effectively and how long the nightmare will last. To get these answers, we need accurate, comprehensive models. And to get those models, we need to look at how, exactly, models work.

“All models are wrong, but some are useful.”

This statement is attributed to British statistician George Box, who later wrote, “All models are approximations.”

So, the question we need to ask is not, “Is the model true?” (It never is.) We need to ask, “Is the model good enough for this particular application?”

A model, by its very definition, is a representation of a system that highlights certain components and ignores others. Hence, it can never reflect all aspects of reality.

However, it is still instructive to understand when and why we get models wrong.

Models are based on assumptions about what needs to be included and excluded from reality. They are also based on assumptions about how different components of the model interact. For example, there is a family of epidemiological models that include some aspects of the disease and ignore others. SIR models look at Susceptibility, Infection and Recovery (SIR) of individuals, while SIS models look at Susceptibility, Infection and Susceptibility (SIS) again, as in recurrence of the common cold. SIRD models add “Deceased” to SIR models, and SEIR models account for “Exposure” when an infectious disease has an incubation period. SEIR and SIRD models are the two types commonly being used for COVID-19.

Data is critical to build models and validate their accuracy – and feeding models inaccurate data will produce inaccurate results. In the case of COVID-19, we need to feed models the number of cases, hospitalizations and deaths due to the coronavirus at a national, state and/or county level. We might have incomplete data (e.g., remote areas may have difficulty collecting and sharing data) or inaccurate data (e.g., in the early stages of a pandemic, the deaths may be associated with other secondary conditions and may have been miscategorized).

Finally, despite our best efforts, there is uncertainty – aspects of the model we don’t know and may never know with certainty. There is still a lot of uncertainty around COVID-19’s infection rate, incubation period and recovery rate — all of which impact the reproduction rate. Furthermore, we still don’t know the impact of the virus on different segments of the population. In addition to disease uncertainty, we also have policy and behaviour uncertainty. We cannot say how different governments and institutions will intervene in this crisis, or how citizens and employees will behave in these stressful circumstances.

So, if models can be wrong, why build them? What purpose do they serve?

Models have traditionally been built with the following uses in mind:

Explanation. Many models are built to explain a certain phenomenon, abstract from the “messiness” of the real world or draw insights that can be used in future scenarios. For example, building models to understand the reproduction rate of previous outbreaks of infectious diseases like Ebola and avian flu can help develop interventions. Knowing how SARS-CoV (Severe Acute Respiratory Syndrome) spread in multiple countries in 2003 can provide useful clues on COVID-19, or SARS-CoV-2.

Projection. Models are used to predict the future when we have sufficient historical data and are in a relatively stable environment. However, when uncertainty is high, models are often used to project a range of outcomes reflecting the inherent uncertainty. As we learn more about the virus and our reaction to it, we can decrease this uncertainty. Current COVID-19 models typically provide a range for the number of cases, hospitalizations and deaths.

Behaviour change. Models can be used to change the behaviour of people. We are all familiar with models that make recommendations as to which books we should read and what products we should buy. Similarly, COVID-19 models have changed attitudes and behaviours of health officials, policymakers, government institutions and citizens.

So, while we could argue that the COVID-19 models are “wrong,” they have still proved useful.

Image: World Economic Forum

Some models are useful, and a few change behaviours.

In just three months, the behaviours of all segments of society have changed dramatically, particularly among government officials and policymakers, as well as citizens.

Health officials and policymakers responded to projections of COVID-19 models — however uncertain they were — by taking remedial measures. The Coronavirus Government Response Tracker published the Stringency Index, which examines 13 measures in response to the virus, including school and workplace closures, cancellation of public events and travel restrictions – measures that would be considered draconian under any other circumstances.

In response to government interventions, citizens have largely complied with restrictions and changed behaviours. They are traveling less, sheltering at home, social distancing and being more conscious of disinfection. They have also changed purchase behaviour. They are shopping online more rather than going to physical stores, and they are consuming more bandwidth as social interactions and entertainment have largely moved online.

These changes in behaviour have naturally impacted the key parameters of COVID-19, thereby changing the data.

When behaviours change, new data trumps models.

COVID-19 infection, hospitalization and death curves were amplified by the media in all affected countries as citizens were urged to “shelter at home” and “flatten the curve.” As a result of behavioural changes, two sets of data also changed.

Case data. As more people learned about the virus and received the message to socially distance, they either ignored or heeded safety warnings, partially or completely. Individual choices depended on a variety of factors: age, societal values (deference to authority or libertarianism) and economic necessity to work, among others. And new cases and hospitalizations started dropping in different countries and US states, which had the intended effect of reducing deaths – but also, perversely, made the original projections seem unrealistically pessimistic (wrong), thereby opening up the model developers to criticism.

Unemployment data. Interventions such as closing businesses and schools shut down economic activity and resulted in a massive reduction in the demand for a large number of goods and services. This also increased unemployment to record levels in the US and other nations — to numbers not seen since the 1929 Great Depression. This, in turn, led to a cry to open the economy. While economic activity does need to be rebooted, the manner in which we do it is likely to play a large role in the total number of deaths resulting from this pandemic.

The impact of these interventions is what led Dr. Anthony Fauci, Director of the National Institute of Allergy and Infectious Diseases, to comment, “When real data comes in, then data, in my mind, always trumps any model.”

COVID-19 models are, initially, built with whatever data we have, along with assumptions about the progression of the disease. These models highlight the range of possible outcomes, leading policymakers and citizens to change behaviour – resulting in the actual number of deaths being significantly lower than what was originally estimated.

The models served their purpose by reducing the number of deaths, so they should be viewed as successes.

The counterfactual question – “If we had not taken any action, how accurate would the models have been?” – is one we could theoretically answer. But the answer is far too expensive in terms of human lives to pursue.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.