Generate simulated data from the underlying model's generative process
Source:R/generate_simulated_data.R
generate_simulated_data.RdFunction that allows the user to generate hospital admissions and site-level wastewater data directly from the generative model, specifying the conditions and parameters to generate from.
Usage
generate_simulated_data(
r_in_weeks = c(rep(1.1, 5), rep(0.9, 5), 1 + 0.007 * 1:16),
n_sites = 4,
ww_pop_sites = c(4e+05, 2e+05, 1e+05, 50000),
pop_size = 3e+06,
site = c(1, 1, 2, 3, 4),
lab = c(1, 2, 3, 3, 3),
ot = 90,
nt = 9,
forecast_horizon = 28,
sim_start_date = lubridate::ymd("2023-09-01"),
hosp_wday_effect = c(0.95, 1.01, 1.02, 1.02, 1.01, 1, 0.99)/7,
i0_over_n = 5e-04,
initial_growth = 1e-04,
sd_in_lab_level_multiplier = 0.25,
mean_obs_error_in_ww_lab_site = 0.2,
mean_reporting_freq = 1/5,
sd_reporting_freq = 1/20,
mean_reporting_latency = 7,
sd_reporting_latency = 3,
mean_log_lod = 5,
sd_log_lod = 0.2,
global_rt_sd = 0.03,
sigma_eps = 0.05,
sd_i0_over_n = 0.5,
if_feedback = FALSE,
subpop_phi = c(25, 50, 70, 40, 100),
input_params_path = fs::path_package("extdata", "example_params.toml", package =
"wwinference")
)Arguments
- r_in_weeks
vector indcating the mean weekly R(t) that drives infection dynamics at the state-level. This gets jittered with random noise to add week-to-week variation.
- n_sites
integer indicating the number of sites
- ww_pop_sites
vector indicating the population size in the catchment area in each of those sites (order must match)
- pop_size
integer indicating the population size in the hypothetical state, default is
3e6- site
vector of integers indicating which site (WWTP) each separate lab-site observation comes from
- lab
vector of integers indicating which lab the lab-site observations come from
- ot
integer indicating the observed time: length of hospital admissions calibration time in days
- nt
integer indicating the nowcast time: length of time between last hospital admissions date and forecast date in days
- forecast_horizon
integer indicating the duration of the forecast in days e.g. 28 days
- sim_start_date
character string in ISO8601 format YYYY-MM-DD indicating the start date of the simulation, used to get a weekday vector
- hosp_wday_effect
a vector that is a simplex of length 7 describing how the hospital admissions are spread out over a week, starting at Monday = 1
- i0_over_n
float between 0 and 1 indicating the initial per capita infections in the state
- initial_growth
float indicating the exponential growth rate in infections (daily) during the unobserved time
- sd_in_lab_level_multiplier
float indicating the standard deviation in the log of the site-lab level multiplier determining how much variation there is systematically in site-labs from the state mean
- mean_obs_error_in_ww_lab_site
float indicating the mean day-to-day variation in observed wastewater concentrations across all lab-sites
- mean_reporting_freq
float indicating the mean frequency of wastewater measurements across sites in per day (e.g. 1/7 is once per week)
- sd_reporting_freq
float indicating the standard deviation in the frequency of wastewater measurements across sites
- mean_reporting_latency
float indicating the mean time from forecast date to last wastewater sample collection date, across sites
- sd_reporting_latency
float indicating the standard deviation in the time from the forecast date to the last wastewater sample collection date, across sites
- mean_log_lod
float indicating the mean log of the LOD in each lab-site
- sd_log_lod
float indicating the standard deviation in the log of the LOD across sites
- global_rt_sd
float indicating the ammount of standard deviation to add to the passed in weekly R(t) to add variability. Default is
0.03- sigma_eps
float indicating the standard deviation between the log of the state R(t) and the log of the subpopulation R(t) across time, in log scale. Default is
0.05- sd_i0_over_n
float indicating the standard deviation between log of initial infections per capita, default is
0.5- if_feedback
Boolean indicating whether or not to include infection feedback into the infection process, default is
FALSE, which sets the strength of the infection feedback to 0. IfTRUE, this will apply an infection feedback drawn from the prior.- subpop_phi
Vector of numeric values indicating the overdispersion parameter phi in the hospital admissions observation process in each subpopulation
- input_params_path
path to the toml file with the parameters to use to generate the simulated data
Value
a list containing three dataframes. hosp_data is a dataframe containing the number of daily hospital admissions by day for a theoretical US state, for the duration of the specified calibration period. hosp_data_eval is a dataframe containing the number of daily hospital admissions by day for a theoretical US state, for the entire evaluation period. ww_data is a dataframe containing the measured wastewater concentrations in each site alongside other metadata necessary for modeling that data.
Examples
if (FALSE) { # \dontrun{
# Generate a simulated dataset from a hypothetical state with 6 sites and 2
# different labs
sim_data <- generate_simulated_data(
n_sites = 6,
site = c(1, 2, 3, 4, 5, 6, 6),
lab = c(1, 1, 1, 1, 2, 2, 3),
ww_pop_sites = c(1e5, 4e5, 2e5, 1.5e5, 5e4, 3e5),
pop_size = 2e6
)
hosp_data <- sim_data$hosp_data
ww_data <- sim_data$ww_data
} # }