Logistic plus linear (LPL) Model
This model proposes a latent true coverage curve, which is subject to observation error. A hierarchy accounts for the effects of categorical features.
Terminology and notation
- feature: A categorical feature of the data (e.g., season, geography) that partially determines coverage
- level: A value taken on by a feature (e.g., New Jersey, or 2018/2019)
- group: A unique combination of features (e.g., New Jersey in 2018/2019)
- \(t\): time since the start of the season, measured in \(\text{year}^{-1}\)
- \(n_{gt}\): number of people in group \(g\), surveyed at time \(t\). Drawn from the
sample_sizecolumn of the NIS data; calledN_totin the codebase. - \(x_{gt}\): number of people in group \(g\), surveyed at time \(t\), who are vaccinated. Approximated as \(\mathrm{round}(\hat{v}_{gt}, n_{gt})\), where \(\hat{v}_{gt}\) is the
estimatecolumn. - \(v_g(t)\): latent true coverage among group \(g\) at time \(t\)
- \(z_{gj}\): integer index indicating the level of the \(j\)-th feature for group \(g\).
For example, let the features be season and geography, in that order. Let group 5 be associated with the fourth season and the third geography. Then \(z_{51} = 4\) and \(z_{52} = 3\).
Model overview
For each group \(g\), the latent coverage \(v_g(t)\) is assumed to be a sum of a logistic curve and a line with intercept at \(t=0\). The shape parameter \(K\) and midpoint \(\tau\) of the logistic curve are assumed to be common to all groups. The height \(A_g\) of the logistic curve is a grand mean \(\mu_A\) plus effects \(\delta_{A,j,z_{gj}}\) for each feature \(j\). For example, the \(A_g\) for Alaska in 2018/2019 will be the grand mean \(\mu_A\), plus the Alaska effect, plus the 2018/2019 effect. In fact, the model uses a third interaction term season-geography, so there is also an Alaska-in-2018/2019 term.
The slopes \(M_g\) follow a similar pattern.
The observations \(x_{gt}\) are beta binomial-distributed around the mean \(v_g(t) \cdot n_{gt}\), with variance modified by an extra parameter \(D\).
Model equations
Note that: