Modelling the disease spread.
Link to the DataWondering.com blog post
The results of the first 50 days of infection spread modelling (cities and directions of infection spread).
Link to video presentation from the OpenDataDay conference (Russian only)
The approach is based on combining two general strategies to infection modelling: using Susceptible-Infectious-Recovered/Removed (SIR) model for the city-level spread, and simultaneously modelling the spread of the decease through the air-traffic network.
Algorithm pseudocode:
- Initialize
INFECTED_CITIES
with Wuhan - For
day
insimulation_days
- For
infected_city
inINFECTED_CITIES
:- Get all
airports
of theinfected_city
- Get all
connections
for theairports
- For
susceptible_city
inconnections
:- Calculate probability of infection of
susceptible_city
- If
susceptible_city
is infected - updateINFECTED_CITIES
- Calculate probability of infection of
- Next
day
- Get all
- For
To model the spread of infection within a particular city we use a homogeneous Susceptible-Infectious-Recovered/Removed (SIR) model with several assumptions. Although quite simplistic, the model proves to be reasonable for approximating the COVID-19 infection spread. There are several reasons for this efficiency:
- A person becomes infectious already during the incubation period (source: Johns Hopkins University). That means there is a direct transition from Susceptible to Infectious bypassing the Exposed step as in SEIR model
- There is no vaccine at the moment, so it's impossible to prevent the decease from spreading using traditional herd immunization strategies. For SIR model that means that all city population is susceptible unless strict quarantine is enforced (more on that later)
- The long incubation period (14 days median, Ibid) and unsymptomatic nature for the majority of infected allow the decease to spread undetected up until first symptomatic infections are detected and tested. That once again aligns with the initial dynamics of the SIR model.
The major idea that we've implemented to address the changes in the infection rate due to social distancing and quarantine measures is dynamically modelling the reproduction nunmber R. The idea is straightforward - adjust R in response to the preventive measures. As a baseline, we took the Wuhan example of preventive measures and their approximate timelines.
- During the first days, the infection spreads largely undetected, hence, R value is close to its upper bound.
- On average, after the median incubation period of 14 days, first social distancing measures are taken into action which drives R down to its average values.
- Finally, after approximately 1-month period strict quarantine measures are enforced, including travel bans, area lockdowns, etc. That results in R value dropping down to its minimum values
Finally, for each infected city we run an SIR model to get the number of infected people for all days of simulation.
To model the infection spread through the airline traffic network we need to calculate the probability that a given susceptible city would be infected by its neighbouring infected city on a given day.
We consider a city infected if at least one infected plane landed in this city. Hence, first we need to calculate the probability that the plane coming from the infected city is infected itself:
where I - number of infected in the city, N - total population of the city.
Next, we can calculate the probability that the city is infected:
where f - flights from city per day.
As a result, we recalculate the probabilities of infection spread based on the estimated number of the infected population in the infected cities. That approach proved to be surprisingly accurate and was able to "predict" major COVID-19 outbreaks, e.g. in Western Europe or the USA.