In this section we estimate a SEIR model using data from US States. Compared to the model in the introductory section, the state model introduces some additional components to incorporate additional data.

# Data¶

The data combines information on

• Daily case counts and deaths from JHU CSSE
• Daily Hospitalizations, recoveries, and testing from the Covid Tracking Project
• Covid related policy changes from Raifman et al
• Movements from Google Mobility Reports
• Hourly workers from Hoembase

# Model¶

For each each the epidemic is modeled as:

where variables are defined as in the introduction, except $CC$, which is new. $CC$ is cumulative confirmed cases. Our data does not contain recoveries for all states, so it is not possible to calculate active cases. Our data does contain cumulative confirmed cases for every state.

## Heterogeneity¶

Some parameters are heterogenous across states and/or time. Specifically, we assume that for state $s$,

where $x_{s,t}$ are observables that shift infection rates. In the estimates below, $x_{s,t}$ will be indicators for whether a state of emergency, stay-at-home order, or other related policy is in place, and measures of movement and business operations.1 $\epsilon_{\beta,s}$ is an unobserved error term with mean $0$. Each of the components of $x$ are $0$ in a baseline, pre-epidemic world. Hence $\beta_{j,0}$ is the (median across states) infection rate absent any policy or behavioral response. The model imposes that $\beta_{j}(t)/\beta_{k}(t)$ are constant across states and time. I have no opinion on whether this is a good assumption.

Additionally, testing rates, $\tau_s$, and the portion of people exposed at time $0$, $p_{0,s}$ vary with state. Analogous, to the way $\beta$ is parameterized, we assume

and

Finally, we assume that $a$, $p_1$, $p_2$, $\gamma_1$, and $\gamma_2$ are common across states and time. Arguably these could vary with state demographics (e.g. older populations have lower recovery and higher death rates), and over time with strain on the medical system. We abstract from these concerns for now.

# Least Squares estimates¶

It is much faster to compute least squares point estimates than a full Bayesian posterior. Although the statistical properties of these estimates are unclear, they give some idea of how well the model can fit the data, and serve as good initial values for computing Bayesian posteriors. Let $\theta = (a, p, \gamma, \tau, \beta, \alpha, \epsilon)$ denote the parameters. We simply minimize

where $words$ are variables in the data, and capital letters are computed from the model (and implicitly depend on $\theta$). Hospitalizations and recoveries are not observed for all states and days, in which case those terms are simply omitted from the objective function.

Given the equal weights to all observables, the objective function will be dominated by the cumulative cases terms. Particularly in the states and days where it is large.

## Estimation¶

using CovidSEIR, Plots, VegaLite, PrettyTables, DataFrames, JLD2
Plots.pyplot()
df = CovidSEIR.statedata()
ode = CovidSEIR.MultiRegion.odeSEIR()
dat = CovidSEIR.RegionsData(df[df[!,:fips].<60,:], idvar=:fips );

out = CovidSEIR.LeastSquares.leastsquares(dat, ode)
params = out.params