Generate simulated data from the underlying model's generative process
Source:R/generate_simulated_data.R
generate_simulated_data.Rd
Function that allows the user to generate hospital admissions and site-level wastewater data directly from the generative model, specifying the conditions and parameters to generate from.
Usage
generate_simulated_data(
site_level_inf_dynamics = TRUE,
site_level_conc_dynamics = FALSE,
r_in_weeks = c(rep(1.1, 5), rep(0.9, 5), 1 + 0.007 * 1:16),
n_sites = 4,
ww_pop_sites = c(4e+05, 2e+05, 1e+05, 50000),
pop_size = 1e+06,
n_lab_sites = 5,
map_site_to_lab = c(1, 1, 2, 3, 4),
ot = 90,
nt = 9,
forecast_time = 28,
sim_start_date = ymd("2023-10-30"),
hosp_wday_effect = c(0.95, 1.01, 1.02, 1.02, 1.01, 1, 0.99)/7,
i0_over_n = 5e-04,
initial_growth = 1e-04,
sd_in_lab_level_multiplier = 0.25,
mean_obs_error_in_ww_lab_site = 0.3,
mean_reporting_freq = 1/7,
sd_reporting_freq = 1/14,
mean_reporting_latency = 7,
sd_reporting_latency = 5,
mean_log_lod = 3.8,
sd_log_lod = 0.2,
example_params_path = fs::path_package("extdata", "example_params.toml", package =
"cfaforecastrenewalww")
)
Arguments
- site_level_inf_dynamics
if TRUE then the toy data has variation in the site-level R(t), if FALSE, assumes same underlying R(t) for the state as in each site
- site_level_conc_dynamics
if TRUE then the toy data has variation in the site-level concentration each day, if FALSE, then the relationship from infection to concentration in each site is the same across sites
- r_in_weeks
The mean weekly R(t) that drives infection dynamics at the state- level. This gets jittered with random noise to add week-to-week variation.
- n_sites
Number of sites
- ww_pop_sites
Catchment area in each of those sites (order must match)
- pop_size
Population size in the state
- n_lab_sites
NUmber of unique combinations of labs and sites. Must be greater than or equal to
n_sites
- map_site_to_lab
Vector mapping the sites to the lab-sites in order of the sites
- ot
observed time: length of hospital admissions calibration time in days
- nt
nowcast time: length of time between last hospital admissions date and forecast date in days
- forecast_time
duration of the forecast in days e.g. 28 days
- sim_start_date
the start date of the simulation, used to get a weekday vector
- hosp_wday_effect
a simplex of length 7 describing how the hospital admissions are spread out over a week, starting at Monday = 1
- i0_over_n
the initial per capita infections in the state
- initial_growth
exponential growth rate during the unobserved time
- sd_in_lab_level_multiplier
standard deviation in the log of the site- lab level multiplier determining how much variation there is systematically in site-labs from the state mean
- mean_obs_error_in_ww_lab_site
mean day to day variation in observed wastewater concentrations across all lab-sites
- mean_reporting_freq
mean frequency of wastewater measurements across sites in per day (e.g. 1/7 is once per week)
- sd_reporting_freq
standard deviation in the frequency of wastewater measurements across sites
- mean_reporting_latency
mean time from forecast date to last wastewater sample collection date, across sites
- sd_reporting_latency
standard deviation in the time from the forecast date to the last wastewater sample collection date, across sites
- mean_log_lod
mean log of the LOD in each lab-site
- sd_log_lod
standard deviation in the log of the LOD across sites
- example_params_path
path to the toml file with the parameters to use to generate the simulated data
Value
a list containing two dataframes. example_df is a dataframe containing all the columns needed to get the stan data needed for the infection dynamics model. It contains values for every site-lab-day combination, with NAs when the wastewater concentrations aren't observed. Hospital admissions are therefore repeated N site-lab times. param_df is a single row data frame of all the static parameters used to generate the model