Tutorial: Modeling Microbial Inactivation Kinetics with Disinfectant Demand in R

Version 1.8

Author

David A Holcomb

Published

September 3, 2024

Description

This tutorial provides instructions for conducting microbial disinfection kinetics analyses in R, including detailed derivations of the kinetic models used. The methods described in this tutorial were implemented to analyze chlorine disinfection kinetics of Elizabethkingia spp., as described in the manuscript:

Holcomb DA, Riner D, Cowan B, Salah Z, Jennings WC, Mattioli MC, Murphy JL. (2024). Chlorine inactivation of Elizabethkingia species in water. Emerging Infectious Diseases, 30 (10). https://doi.org/10.3201/eid3010.240917.

A companion R script, kinetics_functions.R, abstracts all the functions developed in this tutorial. It can be sourced into an R session to make the functions available in the global environment to conduct kinetic analyses. Two accompanying datasets, ek_survival.csv and chlorine_dose.csv, are also provided to replicate the example analysis from this tutorial. All files are available in the manuscript repository.

Background

Our goal is to estimate CT values—the minimum contact time at a given initial disinfectant concentration, in \(mg \cdot min \cdot L^{-1}\), to achieve a desired log-reduction in the quantity of a particular microbe. CT values are not measured directly but rather are inferred from a modeled decay rate estimated from experimental data. Different assumptions about the functional form of the disinfection kinetic relationship result in competing decay rate models that may yield differing CT value estimates. CT values are also temperature- and pH-dependent, but we will not account for those factors in the models and instead assume that everything is conducted at constant temperature and pH (which is typically the case for the sorts of experimental data these models are fit to).

Because CT values assume a constant disinfectant dose, we do want to account disinfectant demand in the experimental systems that reduce the disinfectant concentration over time when estimating the inactivation rate constants. The standard practice in the field is simply to assume a constant log-linear decay of the disinfectant in time, which is estimated from the measured disinfectant concentrations at several time points during the disinfection experiments. The disinfectant decay estimation uncertainty is ignored in the inactivation kinetic models.

Define Models

We will use three kinetic models: first order (log-linear) decay for disinfectant demand, the Chick-Watson pseudo-first order rate law (with disinfectant demand), and the Hom model (with disinfectant demand) for non-linear log-survival [1]. In the following, we derive the model equations and implement the models as R functions.

Setup

Load packages and set options.

## Options
options(tidyverse.quiet = TRUE)
options(dplyr.summarise.inform = FALSE)
options(knitr.kable.NA = '')

## Packages
# Basic
library(knitr)
library(quarto)
library(data.table)

# Nonlinear models
library(gslnls)

# Tidyverse
library(broom)
library(patchwork)
library(tidyverse)  # should go last to avoid conflicts (particularly with data.table)

1. Disinfectant demand (first order decay)

Assuming first order decay provides for a log-linear reduction in disinfectant concentration determined by a single rate constant \(k'\) [2]. The disinfectant concentration \(C\) at contact time \(t\) is given by

\[ \begin{align} C &= C_0 \ \mathrm{e}^{-k't} \\ \ln\left(\frac{C}{C_0}\right) &= -k't \end{align} \tag{1}\]

for initial disinfectant concentration \(C_0\). \(k'\) is obtained as the regression coefficient estimate from a simple linear regression with the log-survival ratio \(\ln\left(\frac{C_t}{C_0}\right)\) as the response (dependent) variable and contact time \(t\) as the predictor (independent) variable:

\[ \begin{align} \ln\left(\frac{C}{C_0}\right)_i &= \beta t_i + \epsilon_i \\ \beta &= -k' \\ \epsilon_i &\sim \mathcal{N}\left(0, \sigma_C \right) \end{align} \tag{2}\]

where \(\epsilon_i\) is the independent, normally distributed residual for observation \(i\) and \(\sigma_C\) is the residual standard deviation of the model fit.

2. Chick-Watson rate law (pseudo-first order)

Ignoring disinfectant demand

In a disinfectant demand-free system, the Chick-Watson rate law describes linear log-survival governed by two parameters: a decay constant \(k\) (analogous to \(k'\) in the first-order disinfectant decay) and the coefficient of dilution \(n\), an empirical constant representing the average number of disinfectant molecules needed to inactivate a single organism [2,3]. For \(N\) viable organisms at time \(t\) and disinfectant concentration \(C\), the rate of change in organism concentration is given by

\[ \begin{align} \frac{\mathrm{d}N}{\mathrm{d}t} &= -k N C^n \\ \int_{N_0}^{N} \frac{\mathrm{d}N}{N} &= -k C^n \int_{0}^{t} \mathrm{d}t \\ \ln\left(\frac{N}{N_0}\right) &= -k C^n t \end{align} \tag{3}\]

where \(N_0\) is the starting concentration of the organism. Solving the Chick-Watson model may be accomplished using non-linear least squares (NLS) regression with the log-survival ratio \(\ln\left(\frac{N}{N_0}\right)\) as the outcome and a function \(g \left( \cdot \right)\) representing the Chick-Watson rate law as the predictor:

\[ \begin{align} \ln\left(\frac{N}{N_0}\right)_i &= g \left(x_i, \theta \right) + \epsilon_i \\ g \left( x_i, \theta \right) &= -k C^n t \\ \epsilon_i &\sim \mathcal{N}\left( 0, \sigma_N \right) \end{align} \tag{4}\]

where \(x_i\) represents the data for observation \(i\) passed to the function (in this case, disinfectant concentration \(C\) and contact time \(t\)) and \(\theta\) are the unknown parameters we wish to solve for (here, rate constant \(k\) and coefficient of dilution \(n\)). Because we are assuming no disinfectant demand, \(C\) is the initial chlorine dose, \(C_0\).

First we’ll define an R function for the Chick-Watson rate law, which evaluates to the log-survival ratio corresponding to given input values of the parameters \(k\) and \(n\) and data \(t\) and \(C_0\)

## Chick-Watson rate law (no demand)
# parameters of interest (k, n) are passed log-transformed for computational reasons
law_cw <- function(c0, time, lnk, lnn){
  
  k <- exp(lnk)
  n <- exp(lnn)
  
  lnS <- -k * (c0^n) * time
  
  return(lnS)
  
}

With disinfectant demand

Assuming first-order disinfectant decay with rate constant \(k'\), we can substitute Equation 1 into the Chick-Watson rate law (Equation 3) to obtain:

\[ \begin{align} \frac{\mathrm{d}N}{\mathrm{d}t} &= -k N C_0^n \ \mathrm{e}^{-n k' t} \\ \int_{N_0}^{N} \frac{\mathrm{d}N}{N} &= -k C_0^n \int_{0}^{t} \mathrm{e}^{-n k' t} \ \mathrm{d}t \\ \ln\left(\frac{N}{N_0}\right) &= -\frac{k C_0^n}{n k'} \left(1 - \mathrm{e}^{-n k' t} \right) \end{align} \tag{5}\]

which predicts the log-survival ratio at contact time \(t\) accounting for log-linear loss of disinfectant over time [4,5].1

As before, we’ll express this equation as an R function that returns log-survival ratios for given values of the kinetic parameters (\(k\), \(k'\), and \(n\)), the initial chlorine dose \(C_0\), and contact time \(t\).

## Chick-Watson rate law with demand
# parameters of interest (k, n) are passed log-transformed for computational reasons
law_cw_demand <- function(c0, time, kprime, lnk, lnn){
  
  k <- exp(lnk)
  n <- exp(lnn)
  
  A <- (-k * c0^n) / (n * kprime)
  B <- 1 - exp(-n * kprime * time)
  
  lnS <- A * B
  
  return(lnS)
  
}

3. Hom model (non-linear log-survival)

The Hom model generalizes the Chick-Watson model to allow for non-linear log-survival ratios by introducing an additional parameter \(m\), the Hom dilution coefficient:

\[ \begin{align} \frac{\mathrm{d}N}{\mathrm{d}t} &= -k m N C^n t^{m-1} \\ \int_{N_0}^{N} \frac{\mathrm{d}N}{N} &= -k m C^n \int_{0}^{t} t^{m-1} \mathrm{d}t \\ \ln\left(\frac{N}{N_0}\right) &= -k C^n t^{m} \end{align} \tag{6}\]

Values of \(m > 1\) allow for a “shoulder” in the survival curve, where disinfection initially proceeds more slowly before accelerating; conversely, values of \(m < 1\) induce “tailing off” of the survival curve, with initial rapid disinfection that slows over time. The chick-Watson rate law is recovered as a special case for \(m = 1\) [2,3].

Incorporating disinfectant demand into the Hom model yields:

\[ \begin{align} \frac{\mathrm{d}N}{\mathrm{d}t} &= -k m N C_0^n \ \mathrm{e}^{-n k' t} t^{m-1} \\ \int_{N_0}^{N} \frac{\mathrm{d}N}{N} &= -k m C_0^n \int_{0}^{t} \mathrm{e}^{-n k' t} t^{m-1} \mathrm{d}t \\ \ln\left(\frac{N}{N_0}\right) &= -k m C_0^n \int_{0}^{t} \mathrm{e}^{-n k' t} t^{m-1} \mathrm{d}t \end{align} \tag{7}\]

Unfortunately, the integral on the right hand side does not have a simple closed form solution [2]. However, there is a clever trick to bypass the integral using the lower incomplete gamma function [6]

\[ \gamma \left(\alpha, x \right) = \int_{0}^{x} \mathrm{e}^{-z} z^{\alpha - 1} \mathrm{d}z \quad \quad \alpha >0, \ x \geq 0 \tag{8}\]

(The gamma function proper, \(\Gamma(\alpha)\), is defined as the integral from zero to infinity; by integrating from zero to \(x\), we obtain only the lower tail of the gamma function, thus “lower incomplete”.)

Letting \(x = z = n k' t\) and integrating, we obtain:

\[ \begin{align} \frac{\mathrm{d}N}{\mathrm{d} \left(nk't \right)} &= \frac{-k m N C_0^n \ \mathrm{e}^{-n k' t} t^{m-1}}{n k'} \\ \int_{N_0}^{N} \frac{\mathrm{d}N}{N} &= -k m C_0^n \int_{0}^{x} \frac{\mathrm{e}^{-z} t^{m-1}}{n k'} \ \mathrm{d}z \\ \ln\left(\frac{N}{N_0}\right) &= \frac{-k m C_0^n}{n k'} \int_{0}^{x} \mathrm{e}^{-z} t^{m-1} \mathrm{d}z \end{align} \]

Because \(t = \frac{z}{n k'}\), we can substitute for \(t\)

\[ \begin{align} \ln\left(\frac{N}{N_0}\right) &= \frac{-k m C_0^n}{n k'} \cdot \frac{1}{ \left(n k' \right)^{m-1}} \int_{0}^{x} \mathrm{e}^{-z} z^{m-1} \mathrm{d}z \\ &= \frac{-k m C_0^n}{ \left(n k' \right)^{m}} \int_{0}^{x} \mathrm{e}^{-z} z^{m-1} \mathrm{d}z \end{align} \]

such that the integral on the right hand side is now the lower incomplete gamma function with \(\alpha = m\) [2]. Because the incomplete gamma function is readily evaluated in R (function pgamma()),2 substitution gives us a tractable Hom model with first order disinfectant demand:

\[ \begin{align} \ln\left(\frac{N}{N_0}\right) &= \frac{-k m C_0^n}{ \left(n k' \right)^{m}} \cdot \gamma \left(m, \ n k' t \right) \quad \quad m > 0, \ n k' t \geq 0 \end{align} \tag{9}\]

As before, we’ll write an R function that implements the incomplete gamma Hom model and returns the log-survival ratio:

## Hom model with demand
# parameters of interest (k, n, m) are passed log-transformed for computational reasons
law_hom_demand <- function(c0, time, kprime, lnk, lnn, lnm){
  
  k <- exp(lnk)
  n <- exp(lnn)
  m <- exp(lnm)
  
  A <- (-k * m * (c0^n)) / ((n * kprime)^m)
  B <- pgamma(n * kprime * time, m) 
  
  lnS <- A * B
  
  return(lnS)
}

Fit Models

Non-linear least squares (NLS)

The kinetic models are fit by NLS, which requires specifying starting values for the unknown parameters. The base R nls() function employs an algorithm that is not very robust to non-optimal starting values, small sample sizes, and poor model geometries, all of which we commonly encounter in inactivation experiment data. It also is limited in the methods that can be applied to the fitted model for further analysis and interpretation. Let’s write an NLS fitting function to use a more robust optimization method that also provides additional post-processing support.

We can also calculate additional model performance indicators, specifically \(R^2\) and a corollary more appropriate for non-linear models, the deviance explained. Deviance measures how well the proposed model fits the data compared with the saturated model that fits the data exactly, in terms of the difference in their log-likelihoods:

\[ D := -2 \left[\ell \left( \hat{\beta} \right) - \ell_s \right] \]

for model log-likelihood \(\ell \left( \hat{\beta} \right)\) and saturated log-likelihood \(\ell_s\). We can compare the deviance of a proposed model to the null deviance, the deviance of the simplest possible model with a single parameter fit to the same data: \(D_0 := -2 \left[\ell \left( \hat{\beta}_0 \right) - \ell_s \right]\). The null model is traditionally the intercept-only model, in which only a constant value equal to the mean of the outcome data \(\bar{y}\) is estimated. In ordinary least squares (OLS) regression, the deviance is equivalent to the residual sum of squares (RSS) \(D = RSS = \sum \left(y_i - \hat{y_i} \right)^2\) and the null deviance corresponds to the total sum of squares (TSS) \(D_0 = TSS = \sum \left(y_i - \bar{y} \right)^2\). The classical coefficient of determination, \(R^2\), is calculated as:

\[ R^2 = 1 - \frac{RSS}{TSS} = 1 - \frac{D}{D_0} \]

In the OLS case, \(R^2\) can be interpreted as the proportion of variance explained by the model. This interpretation no longer holds for nonlinear models, but the same ratio of deviance to the null deviance—the deviance explained—can still be used to assess how close the proposed model is to perfectly reproducing the training data relative to the simplest null model. In NLS, however, the traditional intercept-only null model may not be nested within the proposed model and may therefore be an inappropriate point of comparison for calculating deviance explained. Both the Chick-Watson and Hom models fix the intercept at zero, for example, so an intercept-only model that estimates a non-zero intercept would not be nested within either kinetic model. We might instead fit a single-parameter model that is a linear function of time with the intercept fixed at zero—the same log-linear decay model we used to estimate disinfectant demand. This represents the simplest model fully nested within the kinetic models: the Chick-Watson model is obtained for the Hom model with \(m = 1\), and the linear-in-time model is simply the Chick-Watson model fixed with \(n = 0\) and \(k' = 0\)

\[ \begin{align} \frac{\mathrm{d}N}{\mathrm{d}t} &= -k N C_0^0 \mathrm{e}^{-(0) (0) t} = -k N C_0^0 \mathrm{e}^{0} = -k N (1) (1) \\ \frac{\mathrm{d}N}{\mathrm{d}t} &= -k N\\ \int_{N_0}^{N} \frac{\mathrm{d}N}{N} &= -k \int_{0}^{t} \mathrm{d}t \\ \ln\left(\frac{N}{N_0}\right) &= -kt \end{align} \]

This model implies that log-survival begins at zero (by definition in all our models) and decays at a constant rate, regardless of disinfectant concentration. The ratio of the deviance of our proposed models to the deviance of this constant slope model tells us how much better the proposed models explain our data than if we just assumed a constant decrease in log-survival with time. As with the classical \(R^2\) metric, the deviance explained should not be used for model selection, as it will always increase as parameters are added to the model. We can implement both the classical \(R^2\) metric (which we’ll refer to as “naive” as it uses an inappropriate intercept-only null model) and the deviance explained using the linear-with-time null model in our revised NLS fitting function, along with the root mean square error (RMSE), an absolute measure of model fit closely related to the residual standard deviation: \(RMSE = \sqrt{ \frac{\sum \left(y_i - \hat{y_i} \right)^2} {n}}\).

Because we’ve added so many model summary metrics, let’s separate these out as their own function, which can either be called directly for an existing NLS model object, or called by the fitting function we’ll wrap around it. That means any updates will only have to be done in one place to propagate through everywhere else.

### function to extract estimates and metrics from kinetic model objects
## extract parameter estimates from existing gsl_nls model object
extract_kinetic <- function(fit,
                            form_null = NULL,
                            time_var = "time"){
  
  ## R-Squared
  y <- fit$m$lhs()              # original outcome data
  e_hat <- resid(fit)           # model residuals
  tss <- sum((y - mean(y))^2)   # total sum of squares
  rss <- sum((e_hat)^2)         # residual sum of squares
  r2 <- 1 - (rss / tss)         # R-squared (unreliable for NLS)
  
  ## root mean squared error
  rmse <- sqrt(mean(e_hat^2))
  
  ## Deviance explained
  # fit null model
  if(is.null(form_null)){
    out_var <- as.character(fit$call$formula[[2]])    # outcome variable name
    form_null <- str_c(out_var, " ~ -1 + ", time_var) # nested null model formula
    message("null model not specified \nfitting linear in time with intercept fixed at 0 \n")
  }
  
  form_null <- as.formula(form_null)
  
  fit_null <- glm(form_null,
                  data = augment(fit))
  
  d_null <- deviance(fit_null)        # null deviance (residual deviance of null model)
  d_resid <- deviance(fit)            # residual deviance of proposed model
  d_explain <- 1 - (d_resid / d_null) # deviance explained
  
  
  ## standard model stats
  stat <- glance(fit) %>%
    mutate(rmse = rmse,
           tss = tss,
           r2 = r2,
           dev_explain = d_explain,
           dev_null = d_null) %>%
    select(sigma, rmse,
           nobs, df_resid = df.residual,
           loglik = logLik, tss,
           deviance, dev_null, dev_explain,
           r2, aic = AIC, bic = BIC,
           converge = isConv)
  
  est_ci <- confint(fit)
  
  ### format estimates with stats
  est <- tidy(fit) %>%
    rename(est_ln = estimate,
           est_se_ln = std.error) %>%
    mutate(est_lo_ln = est_ci[,1],
           est_hi_ln = est_ci[,2],
           term = str_remove(term, "ln"),
           est = exp(est_ln),
           est_se = exp(est_se_ln),
           est_lo = exp(est_lo_ln),
           est_hi = exp(est_hi_ln)) %>%
    bind_cols(stat) %>%
    select(term,
           est, est_se, est_lo, est_hi,
           est_stat = statistic, est_pval = p.value,
           ends_with("_ln"),
           everything())
  
  return(est)
  
}

Now we can define a function to actually fit the models by NLS and either return the model object or a flat dataframe with the parameter estimates and fit metrics extracted from a call to the extract_kinetic() function defined above. We will use the gsl_nls() function from the gslnls package to fit the models with the recommended Levenberg-Marquardt (scale = "levenberg") method [7].

We pass the non-linear kinetic model functions defined earlier as part of the model formula call, e.g. formula = log_survival ~ law_cw_demand(c0, time, kprime, lnk, lnn). Values for time, disinfectant dose, and the disinfectant demand rate constant \(k'\) are passed as data (but still included in the formula call, as they are named arguments to the kinetic model functions). Starting values must be supplied as a named list for each unknown parameter on the log scale.

### function to fit nonlinear kinetic models with nonlinear least squares
# argument `flat = TRUE` returns formatted estimates in dataframe, not model object 
fit_kinetic <- function(data,
                        formula,
                        start,
                        control = list(scale = "levenberg"),
                        flat = TRUE,
                        form_null = NULL,
                        time_var = "time"){
  
  ## ensure model formula is correctly formatted
  form <- as.formula(formula)
  
  ## fit NLS model
  fit <- gsl_nls(form,
                 data = data,
                 start = start,
                 trace = FALSE,
                 control = control)
  
  ## return model object?
  if(!flat){
    
    return(fit)
    
  } else {
    
    est <- extract_kinetic(fit,
                           form_null = form_null,
                           time_var = time_var)
    
    return(est)
    
  }
}

Log-linear disinfectant decay

Let’s also define a function to fit the log-linear (first order) disinfectant demand model and estimate \(k'\). For compatibility with the NLS outputs, we’ll calculate many of the same model fit metrics.

## custom function to estimate log-linear decay rate constant k'
# argument `flat = TRUE` returns formatted estimates in dataframe, not model object

fit_loglin <- function(data,
                       formula,
                       flat = TRUE){
  
  form <- as.formula(formula)
  
  fit <- lm(form, data = data)
  
  
  if(!flat){
    
    return(fit)
    
  } else {
    
    ## fit metrics calcs  
    y <- fit$model[,1]            # original outcome data
    e_hat <- resid(fit)           # model residuals
    tss <- sum((y - mean(y))^2)   # total sum of squares
    rss <- sum((e_hat)^2)         # residual sum of squares
    r2 <- 1 - (rss / tss)         # R-squared (unreliable for NLS)
    rmse <- sqrt(mean(e_hat^2))
    dev_null <- glm(form, data = data)$null.deviance
    
    stat <- glance(fit) %>%
      mutate(tss = tss,
             dev_null = dev_null,
             r2 = r2,
             rmse = rmse) %>%
      select(sigma, rmse,
             nobs, df_resid = df.residual,
             loglik = logLik, tss,
             deviance, dev_null, dev_explain = r.squared,
             r2, aic = AIC, bic = BIC)
    
    est <- tidy(fit, conf.int = TRUE) %>%
      select(term,
             est = estimate,
             est_se = std.error,
             est_lo = conf.low,
             est_hi = conf.high,
             est_stat = statistic,
             est_pval = p.value) %>%
      bind_cols(stat)
    
    return(est)
    
  }
}

This function will return estimates for the residual standard deviation (\(\sigma_C\)) and the regression coefficient on contact time, which corresponds to the decay rate constant \(-k'\) (note the sign on \(\beta\) includes the minus sign on \(k'\) in Equation 2, so \(k' = - \beta\)). An intercept term will also be included in the model by default. However, the y-intercept should be exactly zero according to our model (at time \(t = 0\), \(\ln\left(\frac{N}{N_0} \right) = \ln\left(\frac{N_0}{N_0} \right) = \ln\left(1 \right) = 0\)), so it may be appropriate to exclude the intercept term by specifying -1 on the right hand side of the model formula (e.g., log_survival ~ -1 + time). Note that the observations at time zero should probably be excluded from the model for related reasons (i.e., we assume the log-survival ratio at zero contact time is known to be exactly 0).

Interpret Models

Predictions

The fitted models allow us to generate model-based predictions of the log-survival ratio for a given contact time and disinfectant dose. We have already used model predictions at the values of our observed data to calculate the classical \(R^2\) and RMSE using the model residuals. We can also make predictions across a range of contact times to plot the general shape of our fitted inactivation curves, along with prediction intervals to characterize the uncertainty of our model-derived inactivation curves. We will generate the prediction intervals with an asymptotic approach, which assumes the models are well behaved and applies standard large-sample theory to obtain the intervals. While the assumptions may not exactly hold for typical inactivation experiment data, the asymptotic approach it is fast, stable, and predictable, unlike other approaches (e.g., profiling) that frequently fail for these weakly-identified kinetic models. Though the asymptotic prediction intervals should be interpreted cautiously, in practice they provide a reasonable and consistent picture of the uncertainty in our kinetic models.

## predict log-survival from existing GSL_NLS model object
pred_kinetic <- function(fit,
                         newdata = NULL,
                         interval = "prediction",
                         level = 0.95){
  
  
  if(is.null(newdata)) newdata = augment(fit)
  
  # asymptotic predictions
  pred_raw <- predict(fit,
                      newdata = newdata,
                      interval = interval,
                      level = level) %>%
    as_tibble()
  
  pred <- bind_cols(newdata,
                    pred_raw) %>%
    rename(pred_avg = fit,
           pred_lo = lwr,
           pred_hi = upr)
  
  return(pred)
}

Contact time

So far we have been estimating rate constants, which are difficult to interpret directly. Instead, we usually evaluate disinfection kinetics in terms of the contact time required to achieve a desired reduction in microbial concentration for a given disinfectant dose. A common target is 3 log10 reduction, which corresponds to a 99.9% decrease in microbial concentration. We can also calculate contact times for other reasonable reduction targets and disinfectant doses to be presented as a CT table. We can calculate the contact time for a given log-survival ratio and disinfectant dose from a fitted model by rearranging the Chick-Watson rate law (Equation 5):

\[ \begin{align} \ln\left(\frac{N}{N_0}\right) &= -k C^n t \\ t &= \frac{\ln\left(\frac{N}{N_0}\right)}{-k \ C^n} \end{align} \tag{10}\]

Note that contact times are calculated assuming a constant disinfectant dose and do not account for disinfectant demand. However, the values of the parameters used to calculate CT values were estimated accounting for disinfectant demand and could perhaps be thought of as demand-adjusted disinfection rates. As such, Hom model-derived CT values are similarly simple to obtain:

\[ \begin{align} \ln\left(\frac{N}{N_0}\right) &= -k C^n t^{m} \\ t &= \left( \frac{\ln\left(\frac{N}{N_0}\right)}{-k \ C^n} \right)^{1/m} \end{align} \tag{11}\]

Because the Hom model simplifies to the Chick-Watson model for \(m = 1\), we can define a single function to calculate the contact time needed for a given log-reduction using parameters fit using either model:

calc_time <- function(lnS, c0, k, n, m = 1){
  
  tm <- lnS / (-k * c0^n)
  
  t <- tm^(1/m)
  
  return(t)
  
}

CT tables

The contact times can be calculated for different log-reductions as specific disinfectant doses to obtain a table of CT values, the product of disinfectant dose and contact time (at constant temperature and pH) usually used to communicate disinfection requirements for a particular microbe for use by treatment operators.

## prepare table of CT values from model fit for given disinfectant dose
tab_ct <- function(est, .dose = 0.2){
  
  cols = c(m = 1)
  
  est %>%
    select(model, nobs,
           aic, rmse, dev_explain,
           term, est) %>%
    pivot_wider(names_from = term,
                values_from = est) %>%
    tibble::add_column(., !!!cols[!names(cols) %in% names(.)]) %>%
    mutate(m = replace_na(m, 1),
           dose = .dose,
           ct_log2 = .dose * calc_time(lnS = log(0.01), c0 = .dose, k = k, n = n, m = m),
           ct_log3 = .dose * calc_time(lnS = log(0.001), c0 = .dose, k = k, n = n, m = m),
           ct_log4 = .dose * calc_time(lnS = log(0.0001), c0 = .dose, k = k, n = n, m = m),
           m = ifelse(model == "Chick-Watson", NA, m))

}

Comparing rate constant estimates

If we have estimated separate rate constants (the \(k\) parameter estimates) for different subsets of the data then we can test whether the two estimates are statistically equivalent with a simple z-test. While the rate constants could come from different model types (e.g., one from Chick-Watson and the other from Hom), in practice the rate constant estimates from both models are wildly different (likely because Hom is poorly identified with its extra parameter) despite producing very similar log-survival predictions. Instead, it is most useful to focus on comparing parameter estimates from the same model fit to two non-overlapping subsets of the data (e.g., comparing rate constants between clinical and environmental strains of the the same organism).

As regression coefficients from non-linear least squares fits, we assume the rate constants are normally distributed and can test the null hypothesis that two rate constants estimated from separate data subsets are equal using a z-test with the test statistic given by

\[ z = \frac{k_1 - k_2}{\sqrt{se_{k_1}^2 + se_{k_2}^2}} \tag{12}\]

and the two-tailed probability obtained from the standard normal distribution. This is appropriate only if we can assume no covariance between the rate constants: \(\mathrm{cov}(k_1, k_2) = 0\); this assumption does not hold if some of the same data are used in both model fits. Therefore, while we can compare the rate constants fit separately to each species, we cannot compare the rate constant for one species to the rate constant estimated for all the data across species. (In practice it’s probably fine to do that anyway, but technically the denominator would be missing a \(-2 \ \mathrm{cov}(k_1, k_2)\) term that would be difficult to estimate for our NLS fits.)

## Function to z-test if rate constants estimated for different data subsets are different
# rate constants and SEs should genrally be on the log-scale
test_rates <- function(k1, k1_se,
                       k2, k2_se){
  
  mu <- k1 - k2
  
  se_pool <- sqrt(k1_se^2 + k2_se^2)
  
  stat <- mu / se_pool
  
  prob <- 2 * pnorm(-abs(stat))
  
  out <- tibble(k1 = k1,
                k2 = k2,
                k_diff = mu,
                se_pool = se_pool,
                stat = stat,
                pval = prob)
  
  return(out)
  
}

Example

We will demonstrate fitting and interpreting disinfection kinetics models with an example dataset from chlorine inactivation experiments for two species of the opportunistic waterborne bacterial pathogen Elizabethkingia spp.

Setup

All the functions have been saved in the companion script kinetics_functions.R, which can be sourced to make the functions available in the global environment. There are two datasets: the main Elizabethkinga inactivation experiments and the free-chlorine residual concentrations measured over time in the experimental flasks, from which we will estimate disinfectant demand.

## source analysis functions
source("../script/kinetics_functions.R")

## Data
# disinfectant decay
df_dis <- fread("../out/chlorine_dose.csv") 

# Ek survival
df_micro <- fread("../out/ek_survival.csv")

Compare Chick-Watson and Hom models

First we’ll fit both models to all the data and plot the predicted inactivation curves.

Fit

## Estimate disinfectant demand
est_dis <- df_dis %>%
  fit_loglin(formula = lnS ~ -1 + time) %>%
  mutate(term = ifelse(term == "time", "kprime", term),
         across(c(est, est_lo, est_hi), ~-1 * .x),
         model = "disinfectant")

# extract decay constant
kprime <- est_dis %>%
  filter(term == "kprime") %>%
  select(est) %>%
  pull()


### Fit kinetic models
## augment data
df_run <- df_micro %>%
  mutate(kprime = kprime)

# prediction grid
df_pred <- tibble(time = seq(0, 1, 0.01),
                  c0_dose = 0.2,
                  kprime = kprime)


## Fit models
# Chick-Watson
m_cw <- fit_kinetic(df_run,
                    lnS ~ law_cw_demand(c0_dose, time, kprime, lnk, lnn),
                    start = list(lnk = 8,
                                 lnn = log(2)),
                    flat = FALSE)

# Hom
m_hom <- fit_kinetic(df_run,
                     lnS ~ law_hom_demand(c0_dose, time, kprime, lnk, lnn, lnm),
                     start = list(lnk = 20,
                                  lnn = log(2),
                                  lnm = 0.01),
                     flat = FALSE)

## Estimates
est_cw <- extract_kinetic(m_cw) %>%
  mutate(model = "Chick-Watson")
null model not specified 
fitting linear in time with intercept fixed at 0 
est_hom <- extract_kinetic(m_hom) %>%
  mutate(model = "Hom")
null model not specified 
fitting linear in time with intercept fixed at 0 
est_all <- bind_rows(est_cw,
                     est_hom,
                     est_dis)


## Predictions
pred_cw <- pred_kinetic(m_cw,
                        newdata = df_pred) %>%
  mutate(model = "Chick-Watson")

pred_hom <- pred_kinetic(m_hom,
                         newdata = df_pred) %>%
  mutate(model = "Hom")

pred_all <- bind_rows(pred_cw,
                      pred_hom)

Tabulate

Construct a table of fit metrics, parameter estimates, and CT values for the two models.

## CT table
ct_all <- est_all %>%
  filter(model != "disinfectant") %>%
  tab_ct() %>%
  mutate(across(c(k, n, m), ~log(.x)))

ct_all %>%
  kable(digits = 2,
        col.names = c("Model", "N",
                      "AIC", "Deviance Explained", "RMSE",
                      "ln(k)", "ln(n)", "ln(m)",
                      "Cl Dose (mg/L)",
                      "99%",
                      "99.9%",
                      "99.99%")) %>%
  kableExtra::add_header_above(header = c(" " = 2,
                                          "Model Fit" = 3,
                                          "Parameter Estimates" = 3,
                                          " " = 1,
                                          "CT Value (min*mg/L)" = 3))
Model Fit
Parameter Estimates
CT Value (min*mg/L)
Model N AIC Deviance Explained RMSE ln(k) ln(n) ln(m) Cl Dose (mg/L) 99% 99.9% 99.99%
Chick-Watson 83 430.92 3.13 0.52 10.36 1.41 0.2 0.02 0.03 0.04
Hom 83 431.56 3.10 0.53 25.06 2.41 0.88 0.2 0.02 0.02 0.03

Plot

## Data
# predictions
pred_plot <- pred_all %>%
  mutate(logS = log10(exp(pred_avg)),
         lo = log10(exp(pred_lo)),
         hi = log10(exp(pred_hi)))

# data
data_plot <- df_run %>%
  mutate(logS = log10(exp(lnS)))


## Color Scales
col_model <- c("Chick-Watson" = "#bd0026",
               "Hom" = "#fd8d3c")


## Compare models for all data
pred_plot %>%
  mutate(model = fct_relevel(model, c("Chick-Watson", "Hom"))) %>%
  ggplot(data = .,
         mapping = aes(x = time, y = logS)) +
  geom_hline(aes(yintercept = -3),
             linetype = "dashed") +
  geom_point(data = data_plot,
             alpha = 0.8,
             position = position_jitter(width = 0.01),
             show.legend = TRUE) +
  geom_line(aes(color = model, linetype = model),
            linewidth = 1.25) +
  geom_ribbon(aes(ymin = lo, ymax = hi, fill = model,
                  color = model, linetype = model),
              alpha = 0.3) +
  annotate("text",
           x = 0.75, y = -3.1, size = 3,
           vjust = "top",
           label = "99.9% inactivation") +
  annotate("text",
           x = 0.95, y = -0.95, size = 3,
           hjust = "right", vjust = "bottom",
           label = "all data") +
  annotate("text",
           x = 0.05, y = -7.95, size = 3,
           hjust = "left", vjust = "bottom",
           label = "0.2 mg/L FCR") +
  scale_color_manual(values = col_model,
                     name = "Model") +
  scale_fill_manual(values = col_model,
                     name = "Model") +
  guides(color = guide_legend(override.aes = list(size = 2, linewidth = 0.5))) +
  scale_linetype_manual(values = c("solid", "longdash"),
                        name = "Model") +
  scale_x_continuous(limits = c(0, 1),
                     expand = expansion(add = c(0, 0.01)),
                     oob = scales::squish) +
  scale_y_continuous(limits = c(-8.5, 0),
                     breaks = c(0, -1, -2, -3, -4, -5, -6, -7, -8, -9),
                     labels = c(0, 1, 2, 3, 4, 5, 6, 7, 8 , 9),
                     expand = expansion(add = c(0.01, 0.01)),
                     oob = scales::squish) +
  labs(x = "contact time (min)",
       y = expression(log[10]~inactivation)) +
  theme_bw() 

Chick-Watson and Hom model inactivation curves

Compare species

We have two similar species of Elizabethkingia, E. anophelis and E. meningoseptica. We will fit the Chick-Watson model to each and compare the inactivation rate constant estimates between them.

The companion script already contains a convenience wrapper function analyze_kinetic() to perform the entire analysis procedure from above that can be applied to each data subset. This wrapper function was written using the variable names in our dataset and contains other analysis-specific assumptions, so it would need to be modified for more general use.

Fit

## EkA
# disinfectant decay
df_eka_dis <- df_dis %>%
  filter(species == "EkA")

df_eka <- df_micro %>%
  filter(species == "EkA")


## run analysis
run_eka <- analyze_kinetic(data = df_eka,
                           data_dis = df_eka_dis)

## extract estimates
est_eka <- run_eka$est %>%
  mutate(species = "E. anophelis")

## extract predictions
pred_eka <- run_eka$pred %>%
  mutate(species = "E. anophelis")

## extract data
data_eka <- run_eka$data %>%
  mutate(species = "E. anophelis")


## EkM
# disinfectant decay
df_ekm_dis <- df_dis %>%
  filter(species == "EkM")

df_ekm <- df_micro %>%
  filter(species == "EkM")


## run analysis
run_ekm <- analyze_kinetic(data = df_ekm,
                           data_dis = df_ekm_dis)


## extract estimates
est_ekm <- run_ekm$est %>%
  mutate(species = "E. meningoseptica")

## extract predictions
pred_ekm <- run_ekm$pred %>%
  mutate(species = "E. meningoseptica")

## extract data
data_ekm <- run_ekm$data %>%
  mutate(species = "E. meningoseptica")

Plot

## Data
# predictions
pred_combo <- bind_rows(pred_eka,
                        pred_ekm) %>%
  filter(model == "Chick-Watson") %>%
  mutate(logS = log10(exp(pred_avg)),
         lo = log10(exp(pred_lo)),
         hi = log10(exp(pred_hi)),
         species = fct_relevel(species, c("E. anophelis", "E. meningoseptica")))

# data
data_combo <- bind_rows(data_eka, data_ekm) %>%
  mutate(group = "species",
         species = fct_relevel(species, c("E. anophelis", "E. meningoseptica")),
         logS = log10(exp(lnS)))


## Color Scales
col_species <- c("E. anophelis" = "#253494",
                 "E. meningoseptica" = "#f768a1")


## Compare species with Chick-Watson Model
pred_combo %>%
  ggplot(data = .,
         mapping = aes(x = time, y = logS)) +
  geom_hline(aes(yintercept = -3),
             linetype = "dashed") +
  geom_point(aes(color = species, shape = species),
             data = data_combo,
             position = position_jitter(width = 0.01),
             alpha = 0.8) +
  geom_line(aes(color = species, linetype = species),
            linewidth = 1.25) +
  geom_ribbon(aes(ymin = lo, ymax = hi, fill = species,
                  color = species, linetype = species),
              alpha = 0.3) +
  annotate("text",
           x = 0.75, y = -3.1, size = 3,
           vjust = "top",
           label = "99.9% inactivation") +
  annotate("text",
           x = 0.95, y = -0.95, size = 3,
           hjust = "right", vjust = "bottom",
           label = "Chick-Watson model") +
  annotate("text",
           x = 0.05, y = -7.95, size = 3,
           hjust = "left", vjust = "bottom",
           label = "0.2 mg/L FCR") +
  scale_color_manual(values = col_species,
                     name = "Species") +
  scale_fill_manual(values = col_species,
                     name = "Species") +
  scale_shape_discrete(name = "Species") +
  guides(color = guide_legend(override.aes = list(size = 2, linewidth = 0.5))) +
  scale_linetype_manual(values = c("solid", "longdash"),
                        name = "Species") +
  scale_x_continuous(limits = c(0, 1),
                     expand = expansion(add = c(0, 0.01)),
                     oob = scales::squish) +
  scale_y_continuous(limits = c(-8.5, 0),
                     breaks = c(0, -1, -2, -3, -4, -5, -6, -7, -8, -9),
                     labels = c(0, 1, 2, 3, 4, 5, 6, 7, 8 , 9),
                     expand = expansion(add = c(0.01, 0.01)),
                     oob = scales::squish) +
  labs(x = "contact time (min)",
       y = expression(log[10]~inactivation)) +
  theme_bw() +
  theme(legend.text = element_text(face="italic"))

Inactivation curves by species (Chick-Watson model)

It looks like the predicted inactivation may be slightly different between the two species, based on the separation between the modeled inactivation curves in the plot. We should test whether the rate constant estimates are statistically different.

Compare rate constants

## By species
k_test_species <- bind_rows(est_eka,
                            est_ekm) %>%
  mutate(group = "species") %>%
  filter(term == "k",
         group == "species",
         model == "Chick-Watson") %>%
  group_by(group, model, term) %>%
  summarise(test = test_rates(est_ln[1], est_se_ln[1],
                              est_ln[2], est_se_ln[2]),
            .groups = "drop") %>%
  unnest(test)


# disinfectant demand - not logged (model used natural-scale rate parameter)
kprime_test_species <- bind_rows(est_eka,
                                 est_ekm) %>%
  mutate(group = "species") %>%
  filter(term == "kprime",
         group == "species") %>%
  group_by(group, model, term) %>%
  summarise(test = test_rates(est[1], est_se[1],
                              est[2], est_se[2]),
            .groups = "drop") %>%
  unnest(test)

## combine
k_test <- bind_rows(k_test_species,
                    kprime_test_species) %>%
  mutate(signif = ifelse(pval < 0.05, "yes", "no"))


## Table
k_test %>%
  kable(digits = 2,
        col.names = c("Group", "Model", "Parameter",
                      "Model 1", "Model 2", "Difference",
                      "Pooled SE", "z-score",
                      "p-value", "Significant (p<0.05)")) %>%
  kableExtra::add_header_above(header = c(" " = 3,
                                          "ln(rate constant)" = 3,
                                          " " = 4))
ln(rate constant)
Group Model Parameter Model 1 Model 2 Difference Pooled SE z-score p-value Significant (p<0.05)
species Chick-Watson k 10.25 9.99 0.26 2.52 0.10 0.92 no
species disinfectant kprime 0.90 0.88 0.02 0.17 0.09 0.93 no

Using a z-test with 95% confidence level, the disinfection and disinfectant demand rate constants are not significantly different by species. The apparent differences observed in the inactivation curve plot above are not reflected in the rate constant estimates, which are adjusted for the disinfectant decay rates (which were also not significantly different) and potentially different coefficients of dilution, the empirical constant \(n\) that is introduced in the Chick-Watson model.

References

1.
Murphy JL, Haas CN, Arrowood MJ, Hlavsa MC, Beach MJ, Hill VR. Efficacy of chlorine dioxide tablets on inactivation of Cryptosporidium oocysts. Environmental Science & Technology 2014, 48 (10), 5849–5856. https://doi.org/10.1021/es500644d.
2.
Gyürék LL, Finch GR. Modeling water treatment chemical disinfection kinetics. Journal of Environmental Engineering 1998, 124 (9), 783–793. https://doi.org/10.1061/(ASCE)0733-9372(1998)124:9(783).
3.
Coleman CK, Kim J, Bailey ES, Abebe LS, Brown J, Simmons OD, Sobsey MD. Bromine and chlorine disinfection of Cryptosporidium parvum oocysts, Bacillus atrophaeus spores, and MS2 coliphage in water. Environmental Science & Technology 2023, 57 (47), 18744–18753. https://doi.org/10.1021/acs.est.3c00536.
4.
Haas CN, Kaymak B. Effect of initial microbial density on inactivation of Giardia muris by ozone. Water Research 2003, 37 (12), 2980–2988. https://doi.org/10.1016/S0043-1354(03)00112-X.
5.
Lawler DF, Singer PC. Analyzing disinfection kinetics and reactor design: a conceptual approach versus the SWTR. Journal AWWA 1993, 85 (11), 67–76. https://doi.org/10.1002/j.1551-8833.1993.tb06103.x.
6.
Haas CN, Joffe Josh. Disinfection under dynamic conditions: modification of Hom’s model for decay. Environmental Science & Technology 1994, 28 (7), 1367–1369. https://doi.org/10.1021/es00056a028.
7.
Chau J. gslnls: GSL nonlinear least-squares fitting; R package version 1.2.0; 2023. https://github.com/JorisChau/gslnls.

 

Disclaimer:

The findings and conclusions of this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

 

Footnotes

  1. The equation provided for the Chick-Watson model with first-order disinfectant decay in Gyürék and Finch (1998) looks somewhat different and introduces new notation:

    \[ \ln \left( \frac{N}{N_0} \right) = - \frac{k}{k'n} \left(C_0^n - C_f^n \right) \]

    where \(C_f\) is described in an appendix table as the disinfectant residual after contact time \(t\). Perhaps this is algebraically equivalent, but as \(C_f\) is not clearly defined and \(t\) no longer explicitly appears in the model, it is not obvious how to fit the model with this parameterization.↩︎

  2. The pgamma() documentation states that it is closely related to the incomplete gamma function but differs by a normalizing factor, namely \(\Gamma \left(\alpha \right)^{-1}\). To recover exactly the incomplete gamma function as defined in Equation 8, one would specify pgamma(x, a) * gamma(a) in R, but it is not necessary for our purposes.↩︎