Pre-process wastewater input data, adding needed indices and flagging potential outliers

Usage

preprocess_ww_data(
  ww_data,
  conc_col_name = "log_genome_copies_per_ml",
  lod_col_name = "log_lod"
)

Arguments

ww_data: dataframe containing the following columns: site, lab, date, site_pop, a column for concentration, and a column for the limit of detection
conc_col_name: string indicating the name of the column containing virus genome concentration measurements in log genome copies per mL, default is log_genome_copies_per_ml
lod_col_name: string indicating the name of the column containing the limits of detection for each wastewater measurement, default is log_lod_sewage. Note that any values in the conc_col_name equal to the limit of detection will be treated as below the limit of detection.

Value

a dataframe containing the same columns as ww_data except the conc_col_name will be replaced with log_genome_copies_per_ml and the lod_col_name will be replaced with log_lod_sewage plus the following additional columns needed for the stan model: lab_site_index, site_index, flag_as_ww_outlier, below_lod, lab_site_name, exclude

Examples

ww_data <- tibble::tibble(
  date = lubridate::ymd(rep(c("2023-11-01", "2023-11-02"), 2)),
  site = c(rep(1, 2), rep(2, 2)),
  lab = c(1, 1, 1, 1),
  log_conc = log(c(345.2, 784.1, 401.5, 681.8)),
  log_lod = log(c(20, 20, 15, 15)),
  site_pop = c(rep(2e5, 2), rep(4e5, 2))
)
ww_data_preprocessed <- preprocess_ww_data(ww_data,
  conc_col_name = "log_conc",
  lod_col_name = "log_lod"
)