Chapter 1: Compartmental Models in Epidemiology
Disease modeling is quite complex. To begin with, it involves modeling entire populations in an ever-changing environment. Identifying which groups will contract a virus and how many people will have the virus at any given timestep is as much a computational challenge as it is a mathematical one.
However, at its very core, all disease modeling can be simplified to scenarios where some type of modeling is possible—and that is the beginning of infectious disease modeling.
Before starting with epispot, let's take a moment to observe simple ways to model infectious diseases—without the code and math behind compartmental models. The simplest way to do this, of course, would be through the use of simulations. For our simulations, we will need to group individuals into three categories:
Now, we define a simulation in which each 'person' is a dot moving around a screen. In the beginning, everyone except one person is susceptible. The first person with the disease is often referred to as 'Patient Zero.' This patient will now go on to infect anyone they come in contact with (i.e. any dot in the same position as them). After a given amount of time, each dot will recover.
An excellent example of this can be seen in this Washington Post article by @HarryStevens. Although the simulation is small (only around 200 people), it gives a simple and intuitive feel for how a virus spreads throughout a population. Here's a sample of what it looks like:
An array of dots show susceptible, infected, and removed people in different colors; most cases are clustered around one location
- Blue = Susceptible
- Orange = Infected
- Pink = Removed
While a simulation can give us an intuitive feel for how a disease might spread, in order to actually analyze the results we're going to need data. And in order to get data we need an equation that can accurately represent the simulation above.
Before we derive this equation, though, it's important to understand the different types of models in epidemiology. Here are the main types:
Similar to the simulation we saw above, agent-based modeling is the process of using 'agents,' or some representation of people, to run a more complex simulation that will eventually give results about how a disease will spread.
Agent-based modeling, however, typically involves more complex tools than just animating dots. Most agent-based models will use population structures to model which classes of individuals will likely interact with other classes. The benefit with agent-based modeling is that it is stochastic. By stochastic, we mean that the model generates different results each time—just like a real epidemic. Because there is no guarantee that one person will be infected, real epidemics are indeed random. Additionally, agent-based models allow for more detailed analysis that other model types.
However, their largest downside is computational inefficiency. Agent-based models are not computationally efficient to implement, especially for large population sizes (imagine running the simulation above with 50,000 dots—and that's just the population of a small town).
Compartmental models, on the other hand, offer a nice balance between computational efficiency and mathematical precision. While they don't allow for precise per-demographic information, they can give quick and accurate estimates with some very simple (and elegant) math.
The idea behind all compartmental models is a compartment. Compartments, like we saw with our simulation, are essentially categories for grouping members of a population. Compartmental models use multiple compartments to model interactions between different compartments. For example, compartmental models will track the number of people in the susceptible compartment moving into the infected compartment because they were infected. By doing this at each timestep, compartmental models can keep track of different classes of individuals and generate estimates of the number of people in each compartment at any given timestep.
The main downside with compartmental models, aside from less detailed information, is that they are deterministic, that is to say, the opposite of stochastic. Essentially, compartmental models will give the same result every time and cannot generate a probability distribution, unlike with agent-based modeling. However, their speed and efficiency more than makes up for this.
The SIR Model, which stands for Susceptible, Infected, Recovered, is one of the most widely-used epidemiological models—it represents the simplest possible model that captures the full state of a disease outbreak through a system of ordinary differential equations.
The motivation for the SIR model may seem unclear at first since simulations, like those in 1.1, offer a much more visual and stochastic perspective than a system of ordinary differential equations. After all, in the simulation you can observe each individual dot and its contacts, as well as retain the stochastic nature of disease outbreaks.
While simulations are ideal, they lack many qualities that would be necessary for them to be used in actual modeling. Firstly, they are extremely computationally expensive (imagine running a simulation with upwards of 1 million dots for large cities). Secondly, and more importantly, they lack important constructs in epidemiology (that will be later explained in 1.3.2) that mean that the figures produced from such models are often incorrect and less precise than traditional modeling techniques (imagine how much more complex urban mobility is from the random movement in out simulations).
For these reasons, we turn to compartmental models, which not only offer more precise measurements but also strip down disease modeling to its core to achieve blazing fast modeling speeds.
In order to boil down disease modeling to its fundamental properties, we introduce the following definitions that will prove useful in 1.3.3 where we derive the actual equations for the SIR model. Later (in 1.3.4), we'll discuss how to merge the following two parameters into just one for the SIR model.
1. Beta (β)
At the beginning of an outbreak, when everyone is susceptible, this parameter gives the number of susceptibles being infected by one infected per unit time. However, as people begin to get infected, beta no longer represents this quantity since an infected cannot infect another infected. At this stage in the outbreak, beta becomes a theoretical, but still important, quantity.
It is also worth pointing out that beta can and does change during the course of an outbreak. Measures like social distancing, quarantines, and improved hygiene can reduce this quantity as infecteds come into contact with less people, and festivals and large events can increase this quantity.
2. Gamma (γ)
Gamma essentially tracks the inverse of the recovery time. In compartmental models, this is known as a rate—remember this, as it will come in handy later in more complex models. Higher values of gamma indicate lower recovery times and lower values of gamma indicate higher recovery times.
Armed with two important ideas—beta and gamma—we can now derive the system of equations behind the SIR model.
The first key insight is to create three different functions to represent the three different compartments, which are the building blocks of the SIR model. We let
The key here is to think about the change in each compartment rather the exact number of individuals in a compartment at a given time. To make things simpler, let's consider the base case: How many susceptibles does one infected infect per unit time? We know that:
The table reveals that only susceptible patients can be infected—so we need to account for the probability that one infected will meet a susceptible to infect. We also know that if everyone was susceptible, one infected would infect
individuals. Remembering that there are
infecteds, we can write this as:
We use the derivative to indicate the change in the susceptible compartment per unit time,
to represent the number of susceptibles, and
to represent the total population. Note the derivative is negative since these people are getting infected and leaving the susceptible compartment.
The next key insight we will use to derive this system will be to note that the population must stay constant (remember that we are assuming death does not significantly change the population structure):
In order for this to be true we must have:
So in order to balance out the negative derivative of the susceptible compartment, either the infected or recovered compartment should have a positive derivative. Since people in the susceptible compartment cannot recover or die without first being infected, we know that the infected compartment must have the inverse derivative of the susceptible compartment.
However, we also know that people move from the infected compartment into the removed compartment at the rate
. Therefore we must also have
Lastly, since this last group of people are moving into the removed compartment, to ensure that the population is stable we must have
Putting all of this together yields the system of ordinary differential equations:
That's it! These equations now form the basic SIR model. However, while the base SIR model provides us with a tool for studying how many people are infected over the course of an outbreak, we can easily expand this model to include more compartments that can track hospitalizations, deaths, and other metrics.
To explore how we can expand this model, we'll consider a simple extension of the SIR model: the S-E-IR model, where the E stands for Exposed. In this model we can not only track the number of people infected but also the number of people who have the disease but cannot spread it yet. In epidemiology, the lag between exposure (having the disease) and infectiousness (spreading the disease) is known as the incubation period.
represent the incubation period in our model. Remember that, similarly to
, it helps to use the reciprocal of the incubation period, specifically
We know that the derivative for the Susceptible compartment won't change because exposed individuals can't infect anyone. What does change, however, is which compartment receives the infected susceptibles. That compartment is, of course, the Exposed compartment. We can also write, similar to what we did with the Removed compartment, that the number of people leaving the Exposed compartment is equal to
. Writing this together gives the derivative for the Exposed compartment:
We also know that the portion of individuals leaving the Exposed compartment are moving to the Infected compartment. Putting this all together yields the system of equations for the SEIR model, as shown below:
At this point it's worth pointing out that epidemiologists don't rely on the parameter
that we have been using to simplify our models. Rather, they use a number known as R Naught, also called the effective reproductive number.
A very easy way to calculate
is to express it in terms of parameters we have already defined. If
yields the number of susceptibles infected per unit time, then we just need to multiply
by the infectious period to calculate
. As it happens,
already gives us reciprocal of that number. Taking the reciprocal of
itself will yield the time that an individual is infected. Lastly, multiplying this with
We can easily substitute this back into both our models to receive their standard forms. Both are shown below:
What we've just seen here is how we can compile a model from various compartments, using the SIR model as a base. We also can see the formulas that govern the laws of infectious disease dynamics. However, as you can imagine, compiling these formulas for each model you want to create and evaluating them again and again is quite tiring. That's where epispot comes in!