Fit a negative binomial branching process model with Bayesian inference via rstan

fit_nbbp_homogenous_bayes(
  all_outbreaks,
  censor_geq = rep(NA, length(all_outbreaks)),
  condition_geq = rep(1, length(all_outbreaks)),
  partial_geq = rep(NA, length(all_outbreaks)),
  partial_probs = rep(NA, length(all_outbreaks)),
  partial_size_max = NA,
  partial_size_max_error = 1e-05,
  shape_r_eff = nbbp::default_res,
  rate_r_eff = nbbp::default_res,
  sigma_inv_sqrt_dispersion = nbbp::default_sisd,
  iter = 5000,
  control = list(adapt_delta = 0.9),
  ...
)

Arguments

all_outbreaks

vector containing the size of each outbreak, including the index case

censor_geq

optional, possibly per-chain, censoring, see details.

condition_geq

optional, possibly per-chain, conditioning on minimum observed chain size, see details.

partial_geq

optional, possibly per-chain, observed chain sizes with binomial sampling, see details.

partial_probs

optional, possibly per-chain, probabilities for incomplete observation, see details.

partial_size_max

optional, when incomplete sampling is present, can be used to manually specify upper limit for marginalizing chain size probability; default of NA corresponds to using adaptive bound via partial_size_max_error.

partial_size_max_error

optional, when incomplete sampling is present, a dynamic bound is computed for marginalizing the observed chain size probability over the unseen true chain size such that the numerical error is no larger than (and typically much smaller than, this value, as discussed in the "Implementation details" vignette.

shape_r_eff

shape parameter of Gamma prior on r_eff.

rate_r_eff

rate parameter of Gamma prior on r_eff.

sigma_inv_sqrt_dispersion

scale of HalfNormal prior on 1 / sqrt(dispersion).

iter

number of iterations for rstan::sampling, default of 5000 intends to be conservative.

control

list for rstan::sampling, default attempts to set adapt_delta conservatively.

...

further values past to rstan::sampling.

Value

an rstan stan_fit object

Details

The likelihood is as described in dnbbp, without conditioning on extinction.

In particular, we fit a Bayesian model in rstan, assuming all cases are observed. The variables recorded in the stan fit object are (ignoring stan's ordering):

  1. r_eff, the effective reproduction number R;

  2. dispersion, k, the parameter controlling (over)dispersion;

  3. inv_sqrt_dispersion, 1 / sqrt(k);

  4. exn_prob the probability that the chain goes extinct for this R and k pair;

  5. p_0 the probability that one individual has 0 offspring for this R and k pair.

It is possible to condition the likelihood on only observing chains of at least a certain size. This may be done on a per-chain basis, if sampling processes vary. If the ith chain should be conditioned on seeing a chain of at least size m, set condition_geq[i] = m. NA for no conditioning (chains can be any size greater than or equal to 1).

Right-censoring is permitted on a per-chain basis. That is, if there is some size above which a chain can be said not to go extinct on its own, we can treat this as an observation of at least that size. If the ith chain size should be treated as censored at size m, set censor_geq[i] = m. The value of all_outbreaks[i] is irrelevant. NA for no censoring.

Alternately, rather than censored, chains may be treated as incompletely observed. In this case, each case in the chain is assumed to be observed with an independent probability given by the corresponding entry in partial_probs, corresponding to a binomial sampling model for the observed chain size. The likelihood of observing a chain of size c, with sampling probability p, from a chain of true size u, is the sum from u = c to infinity of Pr(c | u, p) Pr(u | R, k), where the first term is a binomial sampling probability and the second the completely-observed NBBP likelihood. The value of all_outbreaks[i] is irrelevant for binomially-sampled chains. NA for no binomial sampling. In practice, we truncate this sum to a maximum determined by .get_partial_ub.

Conditioning and/or right-censoring reveal that there are small numerical instabilities in the PMF which can make it sum to values ever so slightly larger than 1. This is handled by rescaling the CDF such that it does not exceed 1 in the range it needs to be evaluated.

We let r_eff ~ Gamma(shape = shape_r_eff, rate = rate_r_eff), and 1/sqrt(dispersion) ~ HalfNormal(sigma = sigma_inv_sqrt_dispersion). For more on the priors and their default values see vignette("default_priors").