Fit a generalized additive model to incident cases

Incident cases are modeled as a smooth function of time with a generalized additive model (GAM). The model is fit with mgcv::gam() and some familiarity with mgcv may be helpful.

Usage

RtGam(
  cases,
  reference_date,
  group = NULL,
  day_of_week = TRUE,
  k = smooth_dim_heuristic(length(cases)),
  m = penalty_dim_heuristic(length(unique(reference_date))),
  backend = "gam",
  warn_for_diagnostic_failure = TRUE,
  ...
)

Arguments

cases: A vector of non-negative incident case counts occurring on an associated reference_date. Missing values (NAs) are not allowed.
reference_date: The associated date on which the count of incident cases occurred. Missing dates are not allowed and dates can only occur once.
group: The grouping variable for the case/reference-date pair. Not yet implemented and a value other than NULL will throw an error.
day_of_week: A boolean or a vector of custom values to be applied to the model as a random effect. If TRUE, then RtGam will use the parsed reference_date values to infer the day of week. If a vector of the same length as reference_date, then the user-supplied values are used as the day-of-week effect, overriding the actual day of week. This approach can be used to, for example, adjust for atypical reporting due to a holiday. If FALSE no day of week effect is applied.
k: An integer, the total dimension of all the smoothing basis functions. Defaults to smooth_dim_heuristic(length(cases)), which picks a reasonable estimate based on the number of provided data points. This total dimension is partitioned between the different smooths in the model. In the case of diagnostic issues in a fitted RtGam model, increasing the value of k above this default and refitting the model is a good first step. See the smooth_dim_heuristic() documentation for more information.
m: An integer, the dimension of the penalty basis for the global smooth trend. If m is greater than 1, the smooth's wiggliness can change over time. An increase in this value above the default should be done carefully. See penalty_dim_heuristic() for more information on m and when to consider changing the default.
backend: One of gam or bam; defaults to gam. In general, models should be fit with mgcv::gam(). If mgcv::gam() is too slow, mgcv::bam() converges more quickly but introduces some additional numerical error. Note that the bam backend uses the discrete = TRUE option for an additional speedup. See mgcv::bam() for more information.
warn_for_diagnostic_failure: Should warnings be issued for automatically identified diagnostic issues? Defaults to TRUE. A list of quantitative model diagnostics can be inspected in the diagnostics slot of the returned RtGam object.
...: Additional arguments passed to the specified modeling backend. For example, the default negative binomial error structure could be changed to poisson in the default mgcv::gam backend by passing family = "poisson".

Value

A fitted model object of type RtGam. The object has named elements:

model: The fitted mgcv model object
data: The processed data.frame used to fit the RtGam model
min_date and max_date: The minimum and maximum reference_date provided
k: The user-provided k argument
m: The user-provided m argument
backend: The user-provided backend argument
formula: The formula object provided to the model
diagnostics: The quantitative diagnostics of the model fit

Model specification

Incident cases ($y$) are modeled as smoothly changing over time:

$$\text{log}\{E(y)\} = \alpha + f_{\text{global}}(t) + \omega(\text{wday}(t))$$

where incidence is negative-binomially distributed, $f(t)$ is a smooth function of time, and $\omega(\text{wday}(t))$ is a random day-of-week effect.

References

Mellor, Jonathon, et al. "An Application of Nowcasting Methods: Cases of Norovirus during the Winter 2023/2024 in England." medRxiv (2024): 2024-07.

Ward, Thomas, et al. "Growth, reproduction numbers and factors affecting the spread of SARS-CoV-2 novel variants of concern in the UK from October 2020 to July 2021: a modelling analysis." BMJ open 11.11 (2021): e056636.

Examples

withr::with_seed(12345, {
  cases <- rpois(20, 10)
})
reference_date <- seq.Date(
  from = as.Date("2023-01-01"),
  length.out = 20,
  by = "day"
)
fit <- RtGam(cases, reference_date)
fit
#> ===============================
#> Fitted RtGam model object (gam)
#> ===============================
#> 
#> Model type: Non-adaptive (m = 1)
#> Specified maximum smoothing basis dimension: 16 
#> Family: Negative Binomial(988135.625) 
#> Link function: log
#> Using day-of-week effects
#> ===============================
#> 
#> Observed data points: 20
#> Distinct reference dates: 20
#> Distinct groups: 1
#> Day-of-week levels: 7
#>