Generate simulated data from the underlying model's generative process
Source:R/generate_simulated_data.R
generate_simulated_data.Rd
Function that allows the user to generate hospital admissions and site-level wastewater data directly from the generative model, specifying the conditions and parameters to generate from.
Usage
generate_simulated_data(
r_in_weeks = c(rep(1.1, 5), rep(0.9, 5), 1 + 0.007 * 1:16),
n_sites = 4,
ww_pop_sites = c(4e+05, 2e+05, 1e+05, 50000),
pop_size = 3e+06,
site = c(1, 1, 2, 3, 4),
lab = c(1, 2, 3, 3, 3),
ot = 90,
nt = 9,
forecast_horizon = 28,
sim_start_date = lubridate::ymd("2023-09-01"),
hosp_wday_effect = c(0.95, 1.01, 1.02, 1.02, 1.01, 1, 0.99)/7,
i0_over_n = 5e-04,
initial_growth = 1e-04,
sd_in_lab_level_multiplier = 0.25,
mean_obs_error_in_ww_lab_site = 0.2,
mean_reporting_freq = 1/5,
sd_reporting_freq = 1/20,
mean_reporting_latency = 7,
sd_reporting_latency = 3,
mean_log_lod = 5,
sd_log_lod = 0.2,
global_rt_sd = 0.03,
sigma_eps = 0.05,
sd_i0_over_n = 0.5,
if_feedback = FALSE,
subpop_phi = c(25, 50, 70, 40, 100),
input_params_path = fs::path_package("extdata", "example_params.toml", package =
"wwinference")
)
Arguments
- r_in_weeks
vector indcating the mean weekly R(t) that drives infection dynamics at the state-level. This gets jittered with random noise to add week-to-week variation.
- n_sites
integer indicating the number of sites
- ww_pop_sites
vector indicating the population size in the catchment area in each of those sites (order must match)
- pop_size
integer indicating the population size in the hypothetical state, default is
3e6
- site
vector of integers indicating which site (WWTP) each separate lab-site observation comes from
- lab
vector of integers indicating which lab the lab-site observations come from
- ot
integer indicating the observed time: length of hospital admissions calibration time in days
- nt
integer indicating the nowcast time: length of time between last hospital admissions date and forecast date in days
- forecast_horizon
integer indicating the duration of the forecast in days e.g. 28 days
- sim_start_date
character string in ISO8601 format YYYY-MM-DD indicating the start date of the simulation, used to get a weekday vector
- hosp_wday_effect
a vector that is a simplex of length 7 describing how the hospital admissions are spread out over a week, starting at Monday = 1
- i0_over_n
float between 0 and 1 indicating the initial per capita infections in the state
- initial_growth
float indicating the exponential growth rate in infections (daily) during the unobserved time
- sd_in_lab_level_multiplier
float indicating the standard deviation in the log of the site-lab level multiplier determining how much variation there is systematically in site-labs from the state mean
- mean_obs_error_in_ww_lab_site
float indicating the mean day-to-day variation in observed wastewater concentrations across all lab-sites
- mean_reporting_freq
float indicating the mean frequency of wastewater measurements across sites in per day (e.g. 1/7 is once per week)
- sd_reporting_freq
float indicating the standard deviation in the frequency of wastewater measurements across sites
- mean_reporting_latency
float indicating the mean time from forecast date to last wastewater sample collection date, across sites
- sd_reporting_latency
float indicating the standard deviation in the time from the forecast date to the last wastewater sample collection date, across sites
- mean_log_lod
float indicating the mean log of the LOD in each lab-site
- sd_log_lod
float indicating the standard deviation in the log of the LOD across sites
- global_rt_sd
float indicating the ammount of standard deviation to add to the passed in weekly R(t) to add variability. Default is
0.03
- sigma_eps
float indicating the standard deviation between the log of the state R(t) and the log of the subpopulation R(t) across time, in log scale. Default is
0.05
- sd_i0_over_n
float indicating the standard deviation between log of initial infections per capita, default is
0.5
- if_feedback
Boolean indicating whether or not to include infection feedback into the infection process, default is
FALSE
, which sets the strength of the infection feedback to 0. IfTRUE
, this will apply an infection feedback drawn from the prior.- subpop_phi
Vector of numeric values indicating the overdispersion parameter phi in the hospital admissions observation process in each subpopulation
- input_params_path
path to the toml file with the parameters to use to generate the simulated data
Value
a list containing three dataframes. hosp_data is a dataframe containing the number of daily hospital admissions by day for a theoretical US state, for the duration of the specified calibration period. hosp_data_eval is a dataframe containing the number of daily hospital admissions by day for a theoretical US state, for the entire evaluation period. ww_data is a dataframe containing the measured wastewater concentrations in each site alongside other metadata necessary for modeling that data.
Examples
if (FALSE) { # \dontrun{
# Generate a simulated dataset from a hypothetical state with 6 sites and 2
# different labs
sim_data <- generate_simulated_data(
n_sites = 6,
site = c(1, 2, 3, 4, 5, 6, 6),
lab = c(1, 1, 1, 1, 2, 2, 3),
ww_pop_sites = c(1e5, 4e5, 2e5, 1.5e5, 5e4, 3e5),
pop_size = 2e6
)
hosp_data <- sim_data$hosp_data
ww_data <- sim_data$ww_data
} # }