Joint inference of count data (e.g. cases/admissions) and wastewater data
Source:R/wwinference.R
wwinference.Rd
Provides a user friendly interface around package functionality to produce estimates, nowcasts, and forecasts pertaining to user-specified delay distributions, set parameters, and priors that can be modified to handle different types of "global" count data and "local" wastewater concentration data using a Bayesian hierarchical framework applied to the two distinct data sources. By default the model assumes a fixed generation interval and delay from infection to the event that is counted. See the getting started vignette for an example model specifications fitting COVID-19 hospital admissions from a hypothetical state and wasteawter concentration data from multiple sites within that state.
Usage
wwinference(
ww_data,
count_data,
forecast_date = NULL,
calibration_time = 90,
forecast_horizon = 28,
model_spec = get_model_spec(),
fit_opts = list(),
generate_initial_values = TRUE,
initial_values_seed = NULL,
compiled_model = compile_model()
)
# S3 method for class 'wwinference_fit'
print(x, ...)
# S3 method for class 'wwinference_fit'
summary(object, ...)
Arguments
- ww_data
A dataframe containing the pre-processed, site-level wastewater concentration data for a model run. The dataframe must contain the following columns:
date
,site
,lab
,log_genome_copies_per_ml
,lab_site_index
,log_lod
,below_lod
,site_pop
exclude
.- count_data
A dataframe containing the pre-procssed, "global" (e.g. state) daily count data, pertaining to the number of events that are being counted on that day, e.g. number of daily cases or daily hospital admissions. Must contain the following columns:
date
,count
,total_pop
- forecast_date
a character string in ISO8601 format (YYYY-MM-DD) indicating the date that the forecast is to be made. Default is NULL
- calibration_time
integer indicating the number of days to calibrate the model for, default is
90
- forecast_horizon
integer indicating the number of days, including the forecast date, to produce forecasts for, default is
28
- model_spec
The model specification parameters as defined using
get_model_spec()
. The default here pertains to theforecast_date
in the example data provided by the package, but this should be specified by the user based on the date they are producing a forecast- fit_opts
MCMC fitting options, as a list of keys and values. These are passed as keyword arguments to
compiled_model$sample()
. Where no option is specified,wwinference()
will fall back first on a package-specific default value given byget_mcmc_options()
, if one exists. If no package-specific default exists,wwinference()
will fall back on the default value defined in$sample()
. See the documentation for$sample()
for details on available options.- generate_initial_values
Boolean indicating whether or not to specify the initialization of the sampler, default is
TRUE
, meaning that initialization lists will be generated and passed as theinit
argument to the model object$sample()
call. function- initial_values_seed
set of integers indicating the random seed of the R sampler of the initial values, default is
NULL
- compiled_model
The pre-compiled model as defined using
compile_model()
- x, object
Object of class
wwinference_fit
- ...
Additional parameters passed to the corresponding method
Value
An object of the ww_inference_fit
class containing the following
items that are intended to be passed to downstream functions to do things
like extract posterior draws, get diangostic behavior, and plot results
(for example). If the model runs, this
function will return:
fit
: The CmdStan object that is returned from the call to
cmdstanr::sample()
. Can be used to access draws, summary, diagnostics, etc.
raw_input_data
: a list containing the input ww_data
and the input
count_data
used in the model.
stan_data_list
: a list containing the inputs passed directly to the
stan model
fit_opts
: a list of the MCMC specifications passed to stan
If the model fails to run, a list containing the follow will be returned:
error
: the error message provided from stan, indicating why the model
failed to run. Note, the model might still run and produce draws even if it
has major model issues. We recommend the user always run the
check_diagnostics()
function on the parameter_diagnostics
as part of any
pipeline to ensure model convergence.
The print method prints out information about the model and returns the object invisibly.
The summary method returns the outcome from the
$summary
(cmdstanr::summary()
) function.
See also
Other diagnostics:
get_model_diagnostic_flags()
,
parameter_diagnostics()
,
summary_diagnostics()
Examples
if (FALSE) { # \dontrun{
ww_data <- tibble::tibble(
date = rep(seq(
from = lubridate::ymd("2023-08-01"),
to = lubridate::ymd("2023-11-01"),
by = "weeks"
), 2),
site = c(rep(1, 14), rep(2, 14)),
lab = c(rep(1, 28)),
conc = log(abs(rnorm(28, mean = 500, sd = 50))),
lod = log(c(rep(20, 14), rep(15, 14))),
site_pop = c(rep(2e5, 14), rep(4e5, 14))
)
ww_data_preprocessed <- preprocess_ww_data(ww_data,
conc_col_name = "conc",
lod_col_name = "lod"
)
input_ww_data <- indicate_ww_exclusions(ww_data_preprocessed)
hosp_data <- tibble::tibble(
date = seq(
from = lubridate::ymd("2023-07-01"),
to = lubridate::ymd("2023-10-30"),
by = "days"
),
daily_admits = sample(5:70, 122, replace = TRUE),
state_pop = rep(1e6, 122)
)
input_count_data <- preprocess_count_data(
hosp_data,
"daily_admits",
"state_pop"
)
generation_interval <- to_simplex(c(0.01, 0.2, 0.3, 0.2, 0.1, 0.1, 0.01))
inf_to_count_delay <- to_simplex(c(
rep(0.01, 12), rep(0.2, 4),
rep(0.01, 10)
))
infection_feedback_pmf <- generation_interval
params <- get_params(
system.file("extdata", "example_params.toml",
package = "wwinference"
)
)
forecast_date <- "2023-11-06"
calibration_time <- 90
forecast_horizon <- 28
include_ww <- 1
ww_fit <- wwinference(
ww_data = input_ww_data,
count_data = input_count_data,
forecast_date = forecast_date,
calibration_time = calibration_time,
forecast_horizon = forecast_horizon,
model_spec = get_model_spec(
generation_interval = generation_interval,
inf_to_count_delay = inf_to_count_delay,
infection_feedback_pmf = infection_feedback_pmf,
params = params
),
fit_opts = list(
iter_warmup = 250,
iter_sampling = 250,
chains = 2
)
)
} # }