Function that allows the user to generate hospital admissions and site-level wastewater data directly from the generative model, specifying the conditions and parameters to generate from.


  r_in_weeks = c(rep(1.1, 5), rep(0.9, 5), 1 + 0.007 * 1:16),
  n_sites = 4,
  ww_pop_sites = c(4e+05, 2e+05, 1e+05, 50000),
  pop_size = 3e+06,
  site = c(1, 1, 2, 3, 4),
  lab = c(1, 2, 3, 3, 3),
  ot = 90,
  nt = 9,
  forecast_horizon = 28,
  sim_start_date = lubridate::ymd("2023-09-01"),
  hosp_wday_effect = c(0.95, 1.01, 1.02, 1.02, 1.01, 1, 0.99)/7,
  i0_over_n = 5e-04,
  initial_growth = 1e-04,
  sd_in_lab_level_multiplier = 0.25,
  mean_obs_error_in_ww_lab_site = 0.2,
  mean_reporting_freq = 1/5,
  sd_reporting_freq = 1/20,
  mean_reporting_latency = 7,
  sd_reporting_latency = 3,
  mean_log_lod = 5,
  sd_log_lod = 0.2,
  global_rt_sd = 0.03,
  sigma_eps = 0.05,
  sd_i0_over_n = 0.5,
  if_feedback = FALSE,
  input_params_path = fs::path_package("extdata", "example_params.toml", package =



vector indcating the mean weekly R(t) that drives infection dynamics at the state-level. This gets jittered with random noise to add week-to-week variation.


integer indicating the number of sites


vector indicating the population size in the catchment area in each of those sites (order must match)


integer indicating the population size in the hypothetical state, default is 3e6


vector of integers indicating which site (WWTP) each separate lab-site observation comes from


vector of integers indicating which lab the lab-site observations come from


integer indicating the observed time: length of hospital admissions calibration time in days


integer indicating the nowcast time: length of time between last hospital admissions date and forecast date in days


integer indicating the duration of the forecast in days e.g. 28 days


character string in ISO8601 format YYYY-MM-DD indicating the start date of the simulation, used to get a weekday vector


a vector that is a simplex of length 7 describing how the hospital admissions are spread out over a week, starting at Monday = 1


float between 0 and 1 indicating the initial per capita infections in the state


float indicating the exponential growth rate in infections (daily) during the unobserved time


float indicating the standard deviation in the log of the site-lab level multiplier determining how much variation there is systematically in site-labs from the state mean


float indicating the mean day-to-day variation in observed wastewater concentrations across all lab-sites


float indicating the mean frequency of wastewater measurements across sites in per day (e.g. 1/7 is once per week)


float indicating the standard deviation in the frequency of wastewater measurements across sites


float indicating the mean time from forecast date to last wastewater sample collection date, across sites


float indicating the standard deviation in the time from the forecast date to the last wastewater sample collection date, across sites


float indicating the mean log of the LOD in each lab-site


float indicating the standard deviation in the log of the LOD across sites


float indicating the ammount of standard deviation to add to the passed in weekly R(t) to add variability. Default is 0.03


float indicating the standard deviation between the log of the state R(t) and the log of the subpopulation R(t) across time, in log scale. Default is 0.05


float indicating the standard deviation between log of initial infections per capita, default is 0.5


Boolean indicating whether or not to include infection feedback into the infection process, default is FALSE, which sets the strength of the infection feedback to 0. If TRUE, this will apply an infection feedback drawn from the prior.


path to the toml file with the parameters to use to generate the simulated data


a list containing three dataframes. hosp_data is a dataframe containing the number of daily hospital admissions by day for a theoretical US state, for the duration of the specified calibration period. hosp_data_eval is a dataframe containing the number of daily hospital admissions by day for a theoretical US state, for the entire evaluation period. ww_data is a dataframe containing the measured wastewater concentrations in each site alongside other metadata necessary for modeling that data.


if (FALSE) { # \dontrun{
# Generate a simulated dataset from a hypothetical state with 6 sites and 2
# different labs
sim_data <- generate_simulated_data(
  n_sites = 6,
  site = c(1, 2, 3, 4, 5, 6, 6),
  lab = c(1, 1, 1, 1, 2, 2, 3),
  ww_pop_sites = c(1e5, 4e5, 2e5, 1.5e5, 5e4, 3e5),
  pop_size = 2e6
hosp_data <- sim_data$hosp_data
ww_data <- sim_data$ww_data
} # }