Skip to contents

Function that allows the user to generate hospital admissions and site-level wastewater data directly from the generative model, specifying the conditions and parameters to generate from.

Usage

generate_simulated_data(
  r_in_weeks = c(rep(1.1, 5), rep(0.9, 5), 1 + 0.007 * 1:16),
  n_sites = 4,
  ww_pop_sites = c(4e+05, 2e+05, 1e+05, 50000),
  pop_size = 3e+06,
  site = c(1, 1, 2, 3, 4),
  lab = c(1, 2, 3, 3, 3),
  ot = 90,
  nt = 9,
  forecast_horizon = 28,
  sim_start_date = lubridate::ymd("2023-09-01"),
  hosp_wday_effect = c(0.95, 1.01, 1.02, 1.02, 1.01, 1, 0.99)/7,
  i0_over_n = 5e-04,
  initial_growth = 1e-04,
  sd_in_lab_level_multiplier = 0.25,
  mean_obs_error_in_ww_lab_site = 0.2,
  mean_reporting_freq = 1/5,
  sd_reporting_freq = 1/20,
  mean_reporting_latency = 7,
  sd_reporting_latency = 3,
  mean_log_lod = 5,
  sd_log_lod = 0.2,
  global_rt_sd = 0.03,
  sigma_eps = 0.05,
  sd_i0_over_n = 0.5,
  if_feedback = FALSE,
  input_params_path = fs::path_package("extdata", "example_params.toml", package =
    "wwinference")
)

Arguments

r_in_weeks

vector indcating the mean weekly R(t) that drives infection dynamics at the state-level. This gets jittered with random noise to add week-to-week variation.

n_sites

integer indicating the number of sites

ww_pop_sites

vector indicating the population size in the catchment area in each of those sites (order must match)

pop_size

integer indicating the population size in the hypothetical state, default is 3e6

site

vector of integers indicating which site (WWTP) each separate lab-site observation comes from

lab

vector of integers indicating which lab the lab-site observations come from

ot

integer indicating the observed time: length of hospital admissions calibration time in days

nt

integer indicating the nowcast time: length of time between last hospital admissions date and forecast date in days

forecast_horizon

integer indicating the duration of the forecast in days e.g. 28 days

sim_start_date

character string in ISO8601 format YYYY-MM-DD indicating the start date of the simulation, used to get a weekday vector

hosp_wday_effect

a vector that is a simplex of length 7 describing how the hospital admissions are spread out over a week, starting at Monday = 1

i0_over_n

float between 0 and 1 indicating the initial per capita infections in the state

initial_growth

float indicating the exponential growth rate in infections (daily) during the unobserved time

sd_in_lab_level_multiplier

float indicating the standard deviation in the log of the site-lab level multiplier determining how much variation there is systematically in site-labs from the state mean

mean_obs_error_in_ww_lab_site

float indicating the mean day-to-day variation in observed wastewater concentrations across all lab-sites

mean_reporting_freq

float indicating the mean frequency of wastewater measurements across sites in per day (e.g. 1/7 is once per week)

sd_reporting_freq

float indicating the standard deviation in the frequency of wastewater measurements across sites

mean_reporting_latency

float indicating the mean time from forecast date to last wastewater sample collection date, across sites

sd_reporting_latency

float indicating the standard deviation in the time from the forecast date to the last wastewater sample collection date, across sites

mean_log_lod

float indicating the mean log of the LOD in each lab-site

sd_log_lod

float indicating the standard deviation in the log of the LOD across sites

global_rt_sd

float indicating the ammount of standard deviation to add to the passed in weekly R(t) to add variability. Default is 0.03

sigma_eps

float indicating the standard deviation between the log of the state R(t) and the log of the subpopulation R(t) across time, in log scale. Default is 0.05

sd_i0_over_n

float indicating the standard deviation between log of initial infections per capita, default is 0.5

if_feedback

Boolean indicating whether or not to include infection feedback into the infection process, default is FALSE, which sets the strength of the infection feedback to 0. If TRUE, this will apply an infection feedback drawn from the prior.

input_params_path

path to the toml file with the parameters to use to generate the simulated data

Value

a list containing three dataframes. hosp_data is a dataframe containing the number of daily hospital admissions by day for a theoretical US state, for the duration of the specified calibration period. hosp_data_eval is a dataframe containing the number of daily hospital admissions by day for a theoretical US state, for the entire evaluation period. ww_data is a dataframe containing the measured wastewater concentrations in each site alongside other metadata necessary for modeling that data.

Examples

if (FALSE) { # \dontrun{
# Generate a simulated dataset from a hypothetical state with 6 sites and 2
# different labs
sim_data <- generate_simulated_data(
  n_sites = 6,
  site = c(1, 2, 3, 4, 5, 6, 6),
  lab = c(1, 1, 1, 1, 2, 2, 3),
  ww_pop_sites = c(1e5, 4e5, 2e5, 1.5e5, 5e4, 3e5),
  pop_size = 2e6
)
hosp_data <- sim_data$hosp_data
ww_data <- sim_data$ww_data
} # }