Propose penalty basis dimension from the number of distinct dates

Return a reasonable value for the m argument of RtGam() based on the number of dates that cases are observed. The m argument controls the dimension of the smoothing penalty basis for the model's global smooth trend (see the Model specification section of the RtGam() documentation for more information about the global trend). The penalty basis dimension controls how much the wiggliness of the global smooth trend can vary over time. Higher values of m help the model to adapt quickly to different epidemic regimes, but are computationally costly.

Usage

penalty_dim_heuristic(n, period = 56)

Arguments

n: An integer, the number of dates with an associated case observation.
period: An integer, the scaling factor used by the dimensionality heuristic. See Implementation details for discussion. Defaults to 56.

Value

An integer, the proposed penalty basis dimension to be used by the global trend.

How `m` is used

The parameter m controls the penalty basis dimension of the model's global smooth trend. If m is 1, there will be single constant penalty on wiggliness over the entire smooth and RtGam will use a thin-plate spline basis for its superior performance in single-penalty settings. If m is 2 or more, the model will use m distinct penalties on the smooth trend's wiggliness and use an adaptive spline basis. The realized penalty at each timepoint smoothly interpolates between the m estimated wiggliness penalties. This adaptive penalty increases the computational cost of the model, but allows for a single model to adapt to changing epidemic dynamics without oversmoothing or introducing spurious wiggly trends.

When to use a different value

Very slow

Decreasing the penalty basis dimension makes the model less demanding to fit. mgcv describes an adaptive penalty with 10 basis dimensions and 200 data points as roughly equivalent to fitting 10 GAMs each from 20 data points. Using a single penalty throughout the model is much simpler than using an adaptive smooth and should be preferred where possible. See [mgcv::smooth.construct.ad.smooth.spec] for more information on how the adaptive smooth basis uses the penalty dimension.

Observed over-smoothing of non-stationary data

If a fitted model is observably over-smoothing, it may be reasonable to refit with a higher penalty basis dimension. Moments with a sudden change in epidemic dynamics, such as a sharp epidemic peak, can be challenging to fit with smooth functions. This option should be used with care due to the increased computational cost.

Implementation details

The algorithm to pick m is \(\lfloor \frac{n}{56} \rfloor + 1\) where \(n \in \mathbb{W}\) is the number of observed dates. This algorithm assumes that over an 8-week period, epidemic dynamics remain roughly similarly wiggly. Sharp jumps or drops requiring a very wiggly trend would remain similarly plausible over much of the 8 week band.

Examples

# Default use invokes `unique()` in case of repeated dates from groups
reference_date <- as.Date(c("2023-01-01", "2023-01-02", "2023-01-03"))
m <- penalty_dim_heuristic(length(reference_date))