6 Applications of SIR Models
This chapter brings the full arc of the module together. We have equations, a solver, and a method for estimating parameters. Now we use all three on real data, working through a complete example from raw observations to a fitted model to biological interpretation.
6.1 Epidemic SIR Model: A Complete Worked Example
By the end of Section 6.1, you should be able to:
- Synthesize the information from Chapters 1 through 6 to construct, solve, and analyze an SIR model using real-world data.
Let’s apply all the information and skills we just learned about to a complete worked example of an epidemic SIR model using real-world data from the 2017-2018 US influenza season.
Step 1: Introducing and Contextualizing Our Data
We will use a modified version of the data on the CDC’s FluView tool, which tracks the weekly number of positive influenza tests reported to the CDC during the 2017–18 flu season in the US (CDC 2026a). The original data reports the number of tests sent into public labs (which report to the CDC) that came back positive for each strain of influenza. We aren’t really interested in what specific strains of influenza are present, so we combined this information to just tell us how many new positive cases of influenza the labs reported each week. We also modified the time reporting of the data set. The original data set reported tests weekly starting from the 40th week of 2017 to the 39th week of 2018 (inclusive). To make the calculations and code easier, we modified the data to only report weeks since the 39th week of 2017; for simplicity, instead of saying “x weeks since the 39th week of 2017,” we will just refer to it as week x. You can view the modified data set below.
You might have noticed that this is actually incidence data, but so far, we have only gone over how to use prevalence data. We are actually going to use this data as an approximation to the true influenza prevalence. We can do this because prevalence is approximately equal to the incidence times the duration of infection (TAMU 2025). According to the CDC, people are infectious with the flu for up to one day before symptoms appear and 5-7 days after (CDC 2026b). Using this information, we can estimate the average duration of infection for influenza to be about 7 days (1 week). Our incidence data is reported weekly, so using our approximation for prevalence, we can say that the prevalence is approximately equal to our incidence data.
We are going to make another approximation about the size of the population which this data comes from (\(N\)). It might seem obvious to use the population size of the United States for \(N\), but that doesn’t account for the fact that this data only represents a very small portion of the total influenza cases seen in the 2017-18 flu season. Very few flu cases are sent to labs for testing, and even fewer of those cases are sent to the public labs which report their data to the CDC. We will assume a population size of \(N=150000\) for this example simply because it produces a nice model fit!
In a true research context, these approximations for the prevalence and population size would not hold and more exact measures would have to be used. We are just using this data as a learning example to build experience creating SIR models and apply the knowledge gained from this module, so these assumptions are okay to make.
Step 2: Visualizing Our Data and Building Model Equations
Before we construct our SIR model equations, let’s get an idea of what our data looks like plotted.
Our data follows a very nice epidemic-like curve: the disease is growing slowly from weeks 0 through 7 then grows rapidly, reaching a peak around weeks 14 to 16 before decreasing to pre-outbreak levels at a similar rate. Now that we have confirmed our data follows a typical epidemic trend, let’s build our model equations. Since we are using count instead of proportions, our model equations will look like so:
\[ \begin{aligned} \frac{dS}{dt} &= -\beta \left(\frac{I}{N}\right) S \\[10pt] \frac{dI}{dt} &= \beta \left(\frac{I}{N}\right) S - \gamma I \\[10pt] \frac{dR}{dt} &= \gamma I \end{aligned} \]
\(\beta\): Transmission rate
\(\gamma\): Recovery rate
\(N\): Population size
Read through Section 2.1 and Section 2.3 if you need a refresher on how we derive these model equations.
Step 3: Estimating Our Model Parameters
Our model only has two parameters: the recovery rate (\(\gamma\)) and the transmission rate (\(\beta\)). We should estimate the recovery rate first because we will need it later when we estimate the transmission rate using MLE.
Recalling from Section 2.2, we can estimate the recovery rate as \(\gamma=\frac{1}{D}\) where \(D\) is the average duration of infection (i.e. how long individuals are contagious with the disease). Like we mentioned in Step 1, influenza is contagious up to 1 day before symptoms appear and up to 5-7 days after. This gives us a total range of 6-8 days that a person with the flu is infectious. For the sake of simplicity, we will use the median value of this range and suppose that \(D=7 \text{ days}\). This gives us an estimated recovery rate of \(\gamma=\frac{1}{7} \text{ day}^{-1}\). Our data is measured in units of weeks, not days, so using a simple conversion our final estimated recovery rate is \(\gamma=1 \text{ week}^{-1}\).
Although people are technically infectious with the flu for 6-8 days, the contagiousness of the flu is at its highest during the first 3-4 days after symptoms start and dramatically reduces thereafter (Clinic 2022). More complex SIR models intended for research purposes would take this into account, but our simple estimation for \(\gamma\) is good enough for this application.
Now that we have an estimate for our recovery rate, let’s shift our focus to estimate our transmission rate \(\beta\) using MLE. The code we will use to do this is nearly identical to the code in Section 4.3, so we won’t go over it in detail here. Some variable names have been changed to reflect the new data set, but besides that, comments explain wherever the code below differs significantly.
If you still are confused about the process of coding an MLE algorithm, go back and read Section 4.2 and Section 4.3 before continuing!
Step 1: Extract data and code a model function.
Step 2: Code a function to return model predictions.
Step 3: Code a function to return negative log-likelihoods.
Step 4: Call mle2 and output estimate for \(\beta\).
Keep this estimate for \(\beta\) in mind because we will use it in the next section to solve and visualize our model!
Step 4: Visualizing Our Model
Now that we have estimates for our model parameters, it’s time to write our full SIR equations and solve them:
\[ \begin{aligned} \frac{dS}{dt} &= -(1.31) \left(\frac{I}{150,000}\right) S \\[10pt] \frac{dI}{dt} &= (1.31) \left(\frac{I}{150,000}\right) S - I \\[10pt] \frac{dR}{dt} &= I \end{aligned} \]
We got these equations by plugging in our estimates we found before (\(\beta=1.31\), \(N=150000\), and \(\gamma=1\)) to the SIR equations from Step 1.
Now that we have our full equations, let’s go ahead and solve them using rk4sys just like we did in Section 3.3.
Step 1: Use rk4sys to produce a data frame of solutions.
Step 2: Plotting the curves.
Plotting all three SIR curves on a single plot will not be very helpful to us since the number of susceptibles is in the hundreds of thousands while the number of infected is only in the thousands. A base R plot will be too zoomed out to show any useful details. To produce a useful plot with all three curves, a special plotting package must be used. Step 5 has a plot of all three curves that can be panned and zoomed into. In this section, we will just plot each curve separately using base R, starting with the susceptible curve:
Now let’s plot the infectious curve. We will also overlay our data to see how good of a fit our model is:
Finally, let’s plot our recovered curve:
Looking at our second plot, it appears that our model fits quite well! It’s not exactly the same as the original data, but that’s not the point: the point is to get an idea of the general trends present in the data, not to capture every small detail of it.
Step 5: Analyzing and Evaluating Our Model
Like we mentioned in Step 1, our model uses data modified from its original form, assumes an approximate equivalence between prevalence and incidence (an assumption that does not always hold in real life), and a placeholder value for the population size. We did these things for the sake of producing a nice, clean example that applies the knowledge from this module, not to produce a model with real-life applications and implications. Any conclusions we draw from this model are purely hypothetical and just for practice; it doesn’t necessarily reflect how influenza acts in real life!
Every model we build is a simplification. The question is not whether the simplification is perfect, but whether it is useful. A model that captures the general shape of an epidemic, even with rough parameter estimates, can still tell us something real about transmission dynamics and vaccination thresholds.
Now that we have that disclaimer out of the way, let’s analyze our model!
The basic reproduction number of our model is \(R_0=\frac{\beta}{\gamma}=1.31\). This means that (according to our model) people with the flu, on average, spread it to a little more than one person throughout the entire time they are sick. In line with the threshold phenomenon, our epidemic was able to spread because the initial proportion of susceptible people \(\tilde{S}(0)=\frac{150000-161}{150000}\approx99.9\%\) is much greater than the reciprocal of our basic reproduction number \(\left(\frac{1}{R_0}\approx76.3\%\right)\).
The plot below shows all three SIR curves. Hovering over any of the three curves displays the effective reproduction number at that time. Like we learned in Section 5.2, the epidemic grows when \(R_e>1\), meets its peak at \(R_e=1\), and shrinks when \(R_e<1\).
Warning: package 'deSolve' was built under R version 4.5.2
Warning: package 'dplyr' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
Warning: package 'plotly' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
If the curves are hard to see, try isolating a curve by double clicking on its entry in the legend then click on the “Autoscale” tool in the toolbar (tool to the left of the home icon). You can also use the “Pan” tool (icon that looks like a four-way arrow) to move the view window around.
Recalling the information from Section 5.3, we should have aimed for at least \(1-\frac{1}{R_0}\approx24\%\) of the population to be successfully vaccinated against influenza to prevent this epidemic from occurring. The influenza vaccine for the 2017-18 season had an estimated \(38\%\) efficacy rate (CDC 2026c), so the actual percentage of the population we should aim to vaccinate is closer to \(\frac{0.24}{0.38}=63\%\).
It might seem counterintuitive to estimate these values using information after the outbreak already occurred, but it’s important to note that influenza outbreaks like the one we are analyzing occur every year during a time known as “flu season” (CDC 2025). These influenza outbreaks act similarly every year. We highly encourage you to visit the CDC’s Interactive FluView tool and look at the data they have on positive influenza cases and hospitalization rates from year to year: a pattern quickly emerges showing that flu numbers peak around the end of December and early January then drop drastically from February onward.
This means that the ideal percentage of people to vaccinate for previous years’ influenza outbreaks does a decent job at informing public health measures for upcoming outbreaks. In fact, studies have estimated the basic reproduction number for seasonal influenza outbreaks to be around \(1.28\), which is not too far off from the basic reproduction number of our model (\(R_0=1.31\)) (Biggerstaff et al. 2014). Even though our model isn’t exactly reflective of how influenza acts in reality, it’s a simple yet decent approximation for how more complex SIR models of influenza would behave.