Example: Early COVID-19 case data in South Korea

In this example we use EpiAware functionality to largely recreate an epidemiological model presented in On the derivation of the renewal equation from an age-dependent branching process: an epidemic modelling perspective, Mishra et al (2020). Mishra et al consider test-confirmed cases of COVID-19 in South Korea between January to July 2020. The components of the epidemilogical model they consider are:

The time varying reproductive number modelled as an AR(2) process on the log-scale $\log R_t \sim \text{AR(2)}$.
The latent infection ($I_t$) generating process is a renewal model (note that we leave out external infections in this note):

$$I_t = R_t \sum_{s\geq 1} I_{t-s} g_s.$$

The discrete generation interval $g_t$ is a daily discretisation of the probability mass function of an estimated serial interval distribution for SARS-CoV-2:

$$G \sim \text{Gamma}(6.5,0.62).$$

Observed cases $C_t$ are distributed around latent infections with negative binomial errors:

$$C_t \sim \text{NegBin}(\text{mean} = I_t,~ \text{overdispersion} = \phi).$$

In the examples below we are going to largely recreate the Mishra et al model, whilst emphasing that each component of the overall epidemiological model is, itself, a stand alone model that can be sampled from.

Dependencies for this notebook

Now we want to import these dependencies into scope. If evaluating these code lines/blocks in REPL, then the REPL will offer to install any missing dependencies. Alternatively, you can add them to your active environment using Pkg.add.

using EpiAware

using Turing, DynamicPPL #Underlying Turing ecosystem packages to interact with models

using Distributions, Statistics #Statistics packages

using CSV, DataFramesMeta #Data wrangling

using CairoMakie, PairPlots, TimeSeries #Plotting backend

using ReverseDiff #Automatic differentiation backend

begin #Date utility and set Random seed
    using Dates
    using Random
    Random.seed!(1)
end

TaskLocalRNG()

Load early SARS-2 case data for South Korea

First, we make sure that we have the data we want to analysis in scope by downloading it for where we have saved a copy in the EpiAware repository.

NB: The case data is curated by the covidregionaldata package. We accessed the South Korean case data using a short R script. It is possible to interface directly from a Julia session using the RCall.jl package, but we do not do this in this notebook to reduce the number of underlying dependencies required to run this notebook.

url = "https://raw.githubusercontent.com/CDCgov/Rt-without-renewal/main/EpiAware/docs/src/showcase/replications/mishra-2020/south_korea_data.csv2"

"https://raw.githubusercontent.com/CDCgov/Rt-without-renewal/main/EpiAware/docs/src/showcase/replications/mishra-2020/south_korea_data.csv2"

data = CSV.read(download(url), DataFrame)

	Column1	date	cases_new	deaths_new
1	1	2019-12-31	0	0
2	2	2020-01-01	0	0
3	3	2020-01-02	0	0
4	4	2020-01-03	0	0
5	5	2020-01-04	0	0
6	6	2020-01-05	0	0
7	7	2020-01-06	0	0
8	8	2020-01-07	0	0
9	9	2020-01-08	0	0
10	10	2020-01-09	0	0
...
214	214	2020-07-31	36	1

Time-varying reproduction number as an `AbstractLatentModel` type

EpiAware exposes a AbstractLatentModel abstract type; the purpose of which is to group stochastic processes which can be interpreted as generating time-varying parameters/quantities of interest which we call latent process models.

In the Mishra et al model the log-time varying reproductive number $Z_t$ is assumed to evolve as an auto-regressive process, AR(2):

$$\begin{align} R_t &= \exp Z_t, \\ Z_t &= \rho_1 Z_{t-1} + \rho_2 Z_{t-2} + \epsilon_t, \\ \epsilon_t &\sim \text{Normal}(0, \sigma^*). \end{align}$$

Where $\rho_1,\rho_2$, which are the parameters of AR process, and $\epsilon_t$ is a white noise process with standard deviation $\sigma^*$.

In EpiAware we determine the behaviour of a latent process by choosing a concrete subtype (i.e. a struct) of AbstractLatentModel which has fields that set the priors of the various parameters required for the latent process.

The AR process has the struct AR <: AbstractLatentModel. The user can supply the priors for $\rho_1,\rho_2$ in the field damp_priors, for $\sigma^*$ in the field std_prior, and the initial values $Z_1, Z_2$ in the field init_priors.

We choose priors based on Mishra et al using the Distributions.jl interface to probability distributions. Note that we condition the AR parameters onto $[0,1]$, as in Mishra et al, using the truncated function.

In Mishra et al the standard deviation of the stationary distribution of $Z_t$ which has a standard normal distribution conditioned to be positive $\sigma \sim \mathcal{N}^+(0,1)$. The value $σ^*$ was determined from a nonlinear function of sampled $\sigma, ~\rho_1, ~\rho_2$ values. Since, Mishra et al give sharply informative priors for $\rho_1,~\rho_2$ (see below) we simplify by calculating $\sigma^*$ at the prior mode of $\rho_1,~\rho_2$. This results in a $\sigma^* \sim \mathcal{N}^+(0, 0.5)$ prior.

ar = AR(
    damp_priors = [truncated(Normal(0.1, 0.05), 0, 1),
        truncated(Normal(0.8, 0.05), 0, 1)],
    init_priors = [Normal(-1.0, 0.1), Normal(-1.0, 0.5)],
    ϵ_t = HierarchicalNormal(std_prior = HalfNormal(0.5))
)

AR{Product{Continuous, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}}, Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, Int64, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}}(Distributions.Product{Distributions.Continuous, Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}, Vector{Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}}}(v=Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}[Truncated(Distributions.Normal{Float64}(μ=0.1, σ=0.05); lower=0.0, upper=1.0), Truncated(Distributions.Normal{Float64}(μ=0.8, σ=0.05); lower=0.0, upper=1.0)]), Distributions.Product{Distributions.Continuous, Distributions.Normal{Float64}, Vector{Distributions.Normal{Float64}}}(v=Normal{Float64}[Distributions.Normal{Float64}(μ=-1.0, σ=0.1), Distributions.Normal{Float64}(μ=-1.0, σ=0.5)]), 2, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}(0.0, HalfNormal{Float64}(μ=0.5), false))

`Turing` model interface to the AR process

As mentioned above, we can use this instance of the AR latent model to construct a Turing model object which implements the probabilistic behaviour determined by ar. We do this with the constructor function exposed by EpiAware: generate_latent which combines an AbstractLatentModel substype struct with the number of time steps for which we want to generate the latent process.

As a refresher, we remind that the Turing.Model object has the following properties:

The model object parameters are sampleable using rand; that is we can generate parameters from the specified priors e.g. θ = rand(mdl).
The model object is generative as a callable; that is we can sample instances of $Z_t$ e.g. Z_t = mdl().
The model object can construct new model objects by conditioning parameters using the DynamicPPL.jl syntax, e.g. conditional_mdl = mdl | (σ_AR = 1.0, ).

As a concrete example we create a model object for the AR(2) process we specified above for 50 time steps:

ar_mdl = generate_latent(ar, 50)

Model{typeof(generate_latent), (:latent_model, :n), (), (), Tuple{AR{Product{Continuous, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}}, Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, Int64, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}}, Int64}, Tuple{}, DefaultContext}(generate_latent, (latent_model = AR{Product{Continuous, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}}, Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, Int64, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}}(Distributions.Product{Distributions.Continuous, Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}, Vector{Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}}}(v=Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}[Truncated(Distributions.Normal{Float64}(μ=0.1, σ=0.05); lower=0.0, upper=1.0), Truncated(Distributions.Normal{Float64}(μ=0.8, σ=0.05); lower=0.0, upper=1.0)]), Distributions.Product{Distributions.Continuous, Distributions.Normal{Float64}, Vector{Distributions.Normal{Float64}}}(v=Normal{Float64}[Distributions.Normal{Float64}(μ=-1.0, σ=0.1), Distributions.Normal{Float64}(μ=-1.0, σ=0.5)]), 2, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}(0.0, HalfNormal{Float64}(μ=0.5), false)), n = 50), NamedTuple(), DefaultContext())

Ultimately, this will only be one component of the full epidemiological model. However, it is useful to visualise its probabilistic behaviour for model diagnostic and prior predictive checking.

We can spaghetti plot generative samples from the AR(2) process with the priors specified above.

plt_ar_sample = let
    n_samples = 100
    ar_mdl_samples = mapreduce(hcat, 1:n_samples) do _
        ar_mdl() .|> exp #Sample Z_t trajectories for the model
    end

    fig = Figure()
    ax = Axis(fig[1, 1];
        yscale = log10,
        ylabel = "Time varying Rₜ",
        title = "$(n_samples) draws from the prior Rₜ model"
    )
    for col in eachcol(ar_mdl_samples)
        lines!(ax, col, color = (:grey, 0.1))
    end
    fig
end

This suggests that a priori we believe that there is a few percent chance of achieving very high $R_t$ values, i.e. $R_t \sim 10-1000$ is not excluded by our priors.

The Renewal model as an `AbstractEpiModel` type

The abstract type for models that generate infections exposed by EpiAware is called AbstractEpiModel. As with latent models different concrete subtypes of AbstractEpiModel define different classes of infection generating process. In this case we want to implement a renewal model.

The Renewal <: AbstractEpiModel type of struct needs two fields:

Data about the generation interval of the infectious disease so it can construct $g_t$.
A prior for the initial numbers of infected.

In Mishra et al they use an estimate of the serial interval of SARS-CoV-2 as an estimate of the generation interval.

truth_GI = Gamma(6.5, 0.62)

Distributions.Gamma{Float64}(α=6.5, θ=0.62)

This is a representation of the generation interval distribution as continuous whereas the infection process will be formulated in discrete daily time steps. By default, EpiAware performs double interval censoring to convert our continuous estimate of the generation interval into a discretized version $g_t$, whilst also applying left truncation such that $g_0 = 0$ and normalising $\sum_t g_t = 1.$

The constructor for converting a continuous estimate of the generation interval distribution into a usable discrete time estimate is EpiData.

model_data = EpiData(gen_distribution = truth_GI)

EpiData{Float64, typeof(exp)}([0.026663134095601056, 0.14059778064943784, 0.2502660305615846, 0.24789569560506844, 0.1731751163417783, 0.09635404000022223, 0.04573437575216367, 0.019313826994143808], 8, exp)

We can compare the discretized generation interval with the continuous estimate, which in this example is the serial interval estimate.

let
    fig = Figure()
    ax = Axis(fig[1, 1];
        xticks = 0:14,
        xlabel = "Days",
        title = "Continuous and discrete generation intervals"
    )
    barplot!(ax, model_data.gen_int;
        label = "Discretized next gen pmf"
    )
    lines!(truth_GI;
        label = "Continuous serial interval",
        color = :green
    )
    axislegend(ax)
    fig
end

The user also needs to specify a prior for the log incidence at time zero, $\log I_0$. The initial history of latent infections $I_{-1}, I_{-2},\dots$ is constructed as

$$I_t = e^{rt} I_0,\qquad t = 0, -1, -2,...$$

Where the exponential growth rate $r$ is determined by the initial reproductive number $R_1$ via the solution to the implicit equation,

$$R_1 = 1 \Big{/} \sum_{t\geq 1} e^{-rt} g_t$$

log_I0_prior = Normal(log(1.0), 1.0)

Distributions.Normal{Float64}(μ=0.0, σ=1.0)

epi = Renewal(model_data; initialisation_prior = log_I0_prior)

Renewal{EpiData{Float64, typeof(exp)}, Normal{Float64}, EpiAware.EpiInfModels.ConstantRenewalStep{Float64}}(EpiData{Float64, typeof(exp)}([0.026663134095601056, 0.14059778064943784, 0.2502660305615846, 0.24789569560506844, 0.1731751163417783, 0.09635404000022223, 0.04573437575216367, 0.019313826994143808], 8, exp), Distributions.Normal{Float64}(μ=0.0, σ=1.0), EpiAware.EpiInfModels.ConstantRenewalStep{Float64}([0.019313826994143808, 0.04573437575216367, 0.09635404000022223, 0.1731751163417783, 0.24789569560506844, 0.2502660305615846, 0.14059778064943784, 0.026663134095601056]))

NB: We don't implement a background infection rate in this model.

`Turing` model interface to `Renewal` process

As mentioned above, we can use this instance of the Renewal latent infection model to construct a TuringModel which implements the probabilistic behaviour determined by epi using the constructor function generate_latent_infs which combines epi with a provided $\log R_t$ time series.

Here we choose an example where $R_t$ decreases from $R_t = 3$ to $R_t = 0.5$ over the course of 50 days.

R_t_fixed = [0.5 + 2.5 / (1 + exp(t - 15)) for t in 1:50]

50-element Vector{Float64}:
 2.9999979211799306
 2.9999943491892553
 2.9999846395634946
 2.99995824644538
 2.9998865053282437
 2.9996915135600344
 2.999161624673834
 ⋮
 0.5000000000002339
 0.500000000000086
 0.5000000000000316
 0.5000000000000117
 0.5000000000000043
 0.5000000000000016

latent_inf_mdl = generate_latent_infs(epi, log.(R_t_fixed))

Model{typeof(generate_latent_infs), (:epi_model, :_Rt), (), (), Tuple{Renewal{EpiData{Float64, typeof(exp)}, Normal{Float64}, EpiAware.EpiInfModels.ConstantRenewalStep{Float64}}, Vector{Float64}}, Tuple{}, DefaultContext}(generate_latent_infs, (epi_model = Renewal{EpiData{Float64, typeof(exp)}, Normal{Float64}, EpiAware.EpiInfModels.ConstantRenewalStep{Float64}}(EpiData{Float64, typeof(exp)}([0.026663134095601056, 0.14059778064943784, 0.2502660305615846, 0.24789569560506844, 0.1731751163417783, 0.09635404000022223, 0.04573437575216367, 0.019313826994143808], 8, exp), Distributions.Normal{Float64}(μ=0.0, σ=1.0), EpiAware.EpiInfModels.ConstantRenewalStep{Float64}([0.019313826994143808, 0.04573437575216367, 0.09635404000022223, 0.1731751163417783, 0.24789569560506844, 0.2502660305615846, 0.14059778064943784, 0.026663134095601056])), _Rt = [1.0986115957278464, 1.098610405062754, 1.0986071685094998, 1.0985983707197156, 1.0985744563952262, 1.098509454567543, 1.0983327911702674, 1.097852790994088, 1.09654964358037, 1.0930193012626002  …  -0.6931471805343999, -0.6931471805505477, -0.693147180556488, -0.6931471805586734, -0.6931471805594774, -0.6931471805597732, -0.693147180559882, -0.693147180559922, -0.6931471805599366, -0.6931471805599422]), NamedTuple(), DefaultContext())

plt_epi = let
    n_samples = 100
    #Sample unconditionally the underlying parameters of the model
    epi_mdl_samples = mapreduce(hcat, 1:n_samples) do _
        latent_inf_mdl()
    end
    fig = Figure()
    ax1 = Axis(fig[1, 1];
        title = "$(n_samples) draws from renewal model with chosen Rt",
        ylabel = "Latent infections"
    )
    ax2 = Axis(fig[2, 1];
        ylabel = "Rt"
    )
    for col in eachcol(epi_mdl_samples)
        lines!(ax1, col;
            color = (:grey, 0.1)
        )
    end
    lines!(ax2, R_t_fixed;
        linewidth = 2
    )
    fig
end

Negative Binomial Observations as an `ObservationModel` type

In Mishra et al latent infections were assumed to occur on their observation day with negative binomial errors, this motivates using the serial interval (the time between onset of symptoms of a primary and secondary case) rather than generation interval distribution (the time between infection time of a primary and secondary case).

Observation models are set in EpiAware as concrete subtypes of an ObservationModel. The Negative binomial error model without observation delays is set with a NegativeBinomialError struct. In Mishra et al the overdispersion parameter $\phi$ sets the relationship between the mean and variance of the negative binomial errors,

$$\text{var} = \text{mean} + {\text{mean}^2 \over \phi}.$$

In EpiAware, we default to a prior on $\sqrt{1/\phi}$ because this quantity is approximately the coefficient of variation of the observation noise and, therefore, is easier to reason on a priori beliefs. We call this quantity the cluster factor.

A prior for $\phi$ was not specified in Mishra et al, we select one below but we will condition a value in analysis below.

obs = NegativeBinomialError(cluster_factor_prior = HalfNormal(0.1))

NegativeBinomialError{HalfNormal{Float64}}(HalfNormal{Float64}(μ=0.1))

`Turing` model interface to the `NegativeBinomialError` model

We can construct a NegativeBinomialError model implementation as a TuringModel using the EpiAwaregenerate_observations functions.

Turing uses missing arguments to indicate variables that are to be sampled. We use this to observe a forward model that samples observations, conditional on an underlying expected observation time series.

First, we set an artificial expected cases curve.

expected_cases = [1000 * exp(-(t - 15)^2 / (2 * 4)) for t in 1:30]

30-element Vector{Float64}:
 2.289734845645553e-8
 6.691586091292782e-7
 1.5229979744712628e-5
 0.0002699578503363014
 0.003726653172078671
 0.04006529739295107
 0.33546262790251186
 ⋮
 0.003726653172078671
 0.0002699578503363014
 1.5229979744712628e-5
 6.691586091292782e-7
 2.289734845645553e-8
 6.101936677605324e-10

obs_mdl = generate_observations(obs, missing, expected_cases)

Model{typeof(generate_observations), (:obs_model, :y_t, :Y_t), (), (:y_t,), Tuple{NegativeBinomialError{HalfNormal{Float64}}, Missing, Vector{Float64}}, Tuple{}, DefaultContext}(generate_observations, (obs_model = NegativeBinomialError{HalfNormal{Float64}}(HalfNormal{Float64}(μ=0.1)), y_t = missing, Y_t = [2.289734845645553e-8, 6.691586091292782e-7, 1.5229979744712628e-5, 0.0002699578503363014, 0.003726653172078671, 0.04006529739295107, 0.33546262790251186, 2.187491118182885, 11.108996538242305, 43.93693362340742  …  11.108996538242305, 2.187491118182885, 0.33546262790251186, 0.04006529739295107, 0.003726653172078671, 0.0002699578503363014, 1.5229979744712628e-5, 6.691586091292782e-7, 2.289734845645553e-8, 6.101936677605324e-10]), NamedTuple(), DefaultContext())

plt_obs = let
    n_samples = 100
    obs_mdl_samples = mapreduce(hcat, 1:n_samples) do _
        θ = obs_mdl() #Sample unconditionally the underlying parameters of the model
    end
    fig = Figure()
    ax = Axis(fig[1, 1];
        title = "$(n_samples) draws from neg. bin. obs model",
        ylabel = "Observed cases"
    )
    for col in eachcol(obs_mdl_samples)
        scatter!(ax, col;
            color = (:grey, 0.2)
        )
    end
    lines!(ax, expected_cases;
        color = :red,
        linewidth = 3,
        label = "Expected cases"
    )
    axislegend(ax)
    fig
end

Composing models into an `EpiProblem`

Mishra et al follows a common pattern of having an infection generation process driven by a latent process with an observation model that links the infection process to a discrete valued time series of incidence data.

In EpiAware we provide an EpiProblem constructor for this common epidemiological model pattern.

The constructor for an EpiProblem requires:

An epi_model.
A latent_model.
An observation_model.
A tspan.

The tspan set the range of the time index for the models.

epi_prob = EpiProblem(epi_model = epi,
    latent_model = ar,
    observation_model = obs,
    tspan = (45, 80))

EpiProblem{Renewal{EpiData{Float64, typeof(exp)}, Normal{Float64}, EpiAware.EpiInfModels.ConstantRenewalStep{Float64}}, AR{Product{Continuous, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}}, Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, Int64, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}}, NegativeBinomialError{HalfNormal{Float64}}}(Renewal{EpiData{Float64, typeof(exp)}, Normal{Float64}, EpiAware.EpiInfModels.ConstantRenewalStep{Float64}}(EpiData{Float64, typeof(exp)}([0.026663134095601056, 0.14059778064943784, 0.2502660305615846, 0.24789569560506844, 0.1731751163417783, 0.09635404000022223, 0.04573437575216367, 0.019313826994143808], 8, exp), Distributions.Normal{Float64}(μ=0.0, σ=1.0), EpiAware.EpiInfModels.ConstantRenewalStep{Float64}([0.019313826994143808, 0.04573437575216367, 0.09635404000022223, 0.1731751163417783, 0.24789569560506844, 0.2502660305615846, 0.14059778064943784, 0.026663134095601056])), AR{Product{Continuous, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}}, Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, Int64, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}}(Distributions.Product{Distributions.Continuous, Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}, Vector{Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}}}(v=Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}[Truncated(Distributions.Normal{Float64}(μ=0.1, σ=0.05); lower=0.0, upper=1.0), Truncated(Distributions.Normal{Float64}(μ=0.8, σ=0.05); lower=0.0, upper=1.0)]), Distributions.Product{Distributions.Continuous, Distributions.Normal{Float64}, Vector{Distributions.Normal{Float64}}}(v=Normal{Float64}[Distributions.Normal{Float64}(μ=-1.0, σ=0.1), Distributions.Normal{Float64}(μ=-1.0, σ=0.5)]), 2, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}(0.0, HalfNormal{Float64}(μ=0.5), false)), NegativeBinomialError{HalfNormal{Float64}}(HalfNormal{Float64}(μ=0.1)), (45, 80))

Inference Methods

We make inferences on the unobserved quantities, such as $R_t$ by sampling from the model conditioned on the observed data. We generate the posterior samples using the No U-Turns (NUTS) sampler.

To make NUTS more robust we provide manypathfinder, which is built on pathfinder variational inference from Pathfinder.jl. manypathfinder runs nruns pathfinder processes on the inference problem and returns the pathfinder run with maximum estimated ELBO.

The composition of doing variational inference as a pre-sampler step which gets passed to NUTS initialisation is defined using the EpiMethod struct, where a sequence of pre-sampler steps can be be defined.

EpiMethod also allows the specification of NUTS parameters, such as type of automatic differentiation, type of parallelism and number of parallel chains to sample.

num_threads = min(10, Threads.nthreads())

inference_method = EpiMethod(
    pre_sampler_steps = [ManyPathfinder(nruns = 4, maxiters = 100)],
    sampler = NUTSampler(
        adtype = AutoReverseDiff(compile = true),
        ndraws = 2000,
        nchains = num_threads,
        mcmc_parallel = MCMCThreads())
)

EpiMethod{ManyPathfinder, NUTSampler{AutoReverseDiff{true}, MCMCThreads, UnionAll}}(ManyPathfinder[ManyPathfinder(10, 4, 100, 100)], NUTSampler{AutoReverseDiff{true}, MCMCThreads, UnionAll}(0.8, AutoReverseDiff(compile=true), MCMCThreads(), 1, 10, 1000.0, 0.0, 2000, AdvancedHMC.DiagEuclideanMetric, -1))

Inference and analysis

We supply the data as a NamedTuple with the y_t field containing the observed data, shortened to fit the chosen tspan of epi_prob.

south_korea_data = (y_t = data.cases_new[epi_prob.tspan[1]:epi_prob.tspan[2]],
    dates = data.date[epi_prob.tspan[1]:epi_prob.tspan[2]])

(y_t = [0, 0, 0, 1, 1, 1, 15, 34, 75, 190  …  131, 242, 114, 110, 107, 76, 74, 84, 93, 152], dates = [Date("2020-02-13"), Date("2020-02-14"), Date("2020-02-15"), Date("2020-02-16"), Date("2020-02-17"), Date("2020-02-18"), Date("2020-02-19"), Date("2020-02-20"), Date("2020-02-21"), Date("2020-02-22")  …  Date("2020-03-10"), Date("2020-03-11"), Date("2020-03-12"), Date("2020-03-13"), Date("2020-03-14"), Date("2020-03-15"), Date("2020-03-16"), Date("2020-03-17"), Date("2020-03-18"), Date("2020-03-19")])

In the epidemiological model it is hard to identify between the AR parameters such as the standard deviation of the AR process and the cluster factor of the negative binomial observation model. The reason for this identifiability problem is that the model assumes no delay between infection and observation. Therefore, on any day the data could be explained by $R_t$ changing or observation noise and its not easy to disentangle greater volatility in $R_t$ from higher noise in the observations.

In models with latent delays, changes in $R_t$ impact the observed cases over several days which means that it easier to disentangle trend effects from observation-to-observation fluctuations.

To counter act this problem we condition the model on a fixed cluster factor value.

fixed_cluster_factor = 0.25

0.25

EpiAware has the generate_epiaware function which joins an EpiProblem object with the data to produce as Turing model. This Turing model composes the three unit Turing models defined above: the Renewal infection generating process, the AR latent process for $\log R_t$, and the negative binomial observation model. Therefore, we can condition on variables as with any other Turing model.

mdl = generate_epiaware(epi_prob, south_korea_data) |
      (var"obs.cluster_factor" = fixed_cluster_factor,)

Model{typeof(generate_epiaware), (:y_t, :time_steps, :epi_model), (:latent_model, :observation_model), (), Tuple{Vector{Int64}, Int64, Renewal{EpiData{Float64, typeof(exp)}, Normal{Float64}, EpiAware.EpiInfModels.ConstantRenewalStep{Float64}}}, Tuple{AR{Product{Continuous, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}}, Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, Int64, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}}, NegativeBinomialError{HalfNormal{Float64}}}, ConditionContext{@NamedTuple{obs.cluster_factor::Float64}, DefaultContext}}(generate_epiaware, (y_t = [0, 0, 0, 1, 1, 1, 15, 34, 75, 190  …  131, 242, 114, 110, 107, 76, 74, 84, 93, 152], time_steps = 36, epi_model = Renewal{EpiData{Float64, typeof(exp)}, Normal{Float64}, EpiAware.EpiInfModels.ConstantRenewalStep{Float64}}(EpiData{Float64, typeof(exp)}([0.026663134095601056, 0.14059778064943784, 0.2502660305615846, 0.24789569560506844, 0.1731751163417783, 0.09635404000022223, 0.04573437575216367, 0.019313826994143808], 8, exp), Distributions.Normal{Float64}(μ=0.0, σ=1.0), EpiAware.EpiInfModels.ConstantRenewalStep{Float64}([0.019313826994143808, 0.04573437575216367, 0.09635404000022223, 0.1731751163417783, 0.24789569560506844, 0.2502660305615846, 0.14059778064943784, 0.026663134095601056]))), (latent_model = AR{Product{Continuous, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}}, Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, Int64, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}}(Distributions.Product{Distributions.Continuous, Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}, Vector{Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}}}(v=Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}[Truncated(Distributions.Normal{Float64}(μ=0.1, σ=0.05); lower=0.0, upper=1.0), Truncated(Distributions.Normal{Float64}(μ=0.8, σ=0.05); lower=0.0, upper=1.0)]), Distributions.Product{Distributions.Continuous, Distributions.Normal{Float64}, Vector{Distributions.Normal{Float64}}}(v=Normal{Float64}[Distributions.Normal{Float64}(μ=-1.0, σ=0.1), Distributions.Normal{Float64}(μ=-1.0, σ=0.5)]), 2, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}(0.0, HalfNormal{Float64}(μ=0.5), false)), observation_model = NegativeBinomialError{HalfNormal{Float64}}(HalfNormal{Float64}(μ=0.1))), ConditionContext((var"obs.cluster_factor" = 0.25,), DynamicPPL.DefaultContext()))

Sampling with `apply_method`

The apply_method function combines the elements above:

An EpiProblem object or Turing model.
An EpiMethod object.
Data to condition the model upon.

And returns a collection of results:

The epidemiological model as a TuringModel.
Samples from MCMC.
Generated quantities of the model.

inference_results = apply_method(mdl,
    inference_method,
    south_korea_data
)

EpiAwareObservables(Model{typeof(generate_epiaware), (:y_t, :time_steps, :epi_model), (:latent_model, :observation_model), (), Tuple{Vector{Int64}, Int64, Renewal{EpiData{Float64, typeof(exp)}, Normal{Float64}, EpiAware.EpiInfModels.ConstantRenewalStep{Float64}}}, Tuple{AR{Product{Continuous, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}}, Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, Int64, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}}, NegativeBinomialError{HalfNormal{Float64}}}, ConditionContext{@NamedTuple{obs.cluster_factor::Float64}, DefaultContext}}(generate_epiaware, (y_t = [0, 0, 0, 1, 1, 1, 15, 34, 75, 190  …  131, 242, 114, 110, 107, 76, 74, 84, 93, 152], time_steps = 36, epi_model = Renewal{EpiData{Float64, typeof(exp)}, Normal{Float64}, EpiAware.EpiInfModels.ConstantRenewalStep{Float64}}(EpiData{Float64, typeof(exp)}([0.026663134095601056, 0.14059778064943784, 0.2502660305615846, 0.24789569560506844, 0.1731751163417783, 0.09635404000022223, 0.04573437575216367, 0.019313826994143808], 8, exp), Distributions.Normal{Float64}(μ=0.0, σ=1.0), EpiAware.EpiInfModels.ConstantRenewalStep{Float64}([0.019313826994143808, 0.04573437575216367, 0.09635404000022223, 0.1731751163417783, 0.24789569560506844, 0.2502660305615846, 0.14059778064943784, 0.026663134095601056]))), (latent_model = AR{Product{Continuous, Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}, Vector{Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}}}, Product{Continuous, Normal{Float64}, Vector{Normal{Float64}}}, Int64, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}}(Distributions.Product{Distributions.Continuous, Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}, Vector{Distributions.Truncated{Distributions.Normal{Float64}, Distributions.Continuous, Float64, Float64, Float64}}}(v=Truncated{Normal{Float64}, Continuous, Float64, Float64, Float64}[Truncated(Distributions.Normal{Float64}(μ=0.1, σ=0.05); lower=0.0, upper=1.0), Truncated(Distributions.Normal{Float64}(μ=0.8, σ=0.05); lower=0.0, upper=1.0)]), Distributions.Product{Distributions.Continuous, Distributions.Normal{Float64}, Vector{Distributions.Normal{Float64}}}(v=Normal{Float64}[Distributions.Normal{Float64}(μ=-1.0, σ=0.1), Distributions.Normal{Float64}(μ=-1.0, σ=0.5)]), 2, HierarchicalNormal{Float64, HalfNormal{Float64}, Bool}(0.0, HalfNormal{Float64}(μ=0.5), false)), observation_model = NegativeBinomialError{HalfNormal{Float64}}(HalfNormal{Float64}(μ=0.1))), ConditionContext((var"obs.cluster_factor" = 0.25,), DynamicPPL.DefaultContext())), (y_t = [0, 0, 0, 1, 1, 1, 15, 34, 75, 190  …  131, 242, 114, 110, 107, 76, 74, 84, 93, 152], dates = [Date("2020-02-13"), Date("2020-02-14"), Date("2020-02-15"), Date("2020-02-16"), Date("2020-02-17"), Date("2020-02-18"), Date("2020-02-19"), Date("2020-02-20"), Date("2020-02-21"), Date("2020-02-22")  …  Date("2020-03-10"), Date("2020-03-11"), Date("2020-03-12"), Date("2020-03-13"), Date("2020-03-14"), Date("2020-03-15"), Date("2020-03-16"), Date("2020-03-17"), Date("2020-03-18"), Date("2020-03-19")]), MCMC chain (2000×52×1 Array{Float64, 3}), @NamedTuple{generated_y_t::Vector{Int64}, I_t::Vector{Float64}, Z_t::Vector{Float64}}[(generated_y_t = [0, 0, 0, 1, 1, 1, 15, 34, 75, 190  …  131, 242, 114, 110, 107, 76, 74, 84, 93, 152], I_t = [0.27857178260998444, 0.29598687312895033, 0.3538813528444231, 1.403474141522508, 1.7069904135861351, 2.569544225041525, 10.030903386365626, 32.862427741526915, 98.64155381590787, 219.352345253656  …  164.48426882852667, 188.54394591451984, 128.3800867753864, 121.5025580135788, 120.50872855190555, 75.2096680755485, 68.71335037090213, 91.22401159111159, 133.4556748669392, 174.37732157818724], Z_t = [-1.1548769635169445, -0.7857907423736026, -0.3162912076431348, 1.2936072044595406, 1.5364450488603436, 1.6096813997198793, 2.470455582292352, 3.078621538980386, 3.2970874024789443, 3.006459811429605  …  -1.135166730596882, -0.8108375811791091, -0.9481223499615887, -0.7632615170257864, -0.5454989414269815, -0.8263502048257322, -0.7501237870264245, -0.31264347666040687, 0.23032839180131182, 0.614588917321758]); (generated_y_t = [0, 0, 0, 1, 1, 1, 15, 34, 75, 190  …  131, 242, 114, 110, 107, 76, 74, 84, 93, 152], I_t = [0.38994624118848775, 0.2606677819286797, 1.190427676087981, 0.4414532132119052, 1.0391725714493802, 1.7175745144752947, 18.77162367635914, 49.494503036211114, 61.18163428725609, 138.7377976695288  …  163.5037469080592, 195.20224674013772, 102.40740218012097, 75.13527257753972, 100.23261780033954, 59.97284532499727, 80.6572829109561, 81.95154093205556, 93.47542947330024, 145.2122998327658], Z_t = [-0.9832210139012595, -1.1433806547965644, 0.616042372412841, -0.18136948253161367, 0.665141198194012, 1.0393626127540914, 3.2701569496190515, 3.5872712853830726, 2.5334285125397935, 2.2938311584571385  …  -1.1774764031654068, -0.8990665807843679, -1.3558645799026865, -1.416893188836893, -0.8507925819751212, -1.066894304892395, -0.48688809006648753, -0.23333360377474724, 0.05643788651647136, 0.5700351352506244]); … ; (generated_y_t = [0, 0, 0, 1, 1, 1, 15, 34, 75, 190  …  131, 242, 114, 110, 107, 76, 74, 84, 93, 152], I_t = [0.9235560798308717, 0.4331579402227369, 0.6642184982569621, 0.8799427439984207, 1.1700202173899932, 2.468185348442503, 12.622458090514922, 48.15760982584189, 52.899061205449065, 153.82003479612465  …  160.34977093567187, 246.83017869470183, 166.28266479542287, 132.55928703569285, 112.02795971710529, 96.28464886321837, 54.2112117384294, 68.97723091265635, 91.33704926339881, 133.48721977813602], Z_t = [-0.9115869974272378, -1.447175350241405, -0.7947314290306344, -0.26751461082086003, 0.2571259635663339, 1.134145462260578, 2.6893237088435074, 3.527240448431013, 2.5284319258250214, 2.527951399869681  …  -0.9135748316132062, -0.36803142360831276, -0.6098399180054241, -0.6841580034938243, -0.719003179367982, -0.7271261985579498, -1.1378393048579698, -0.7134969878590316, -0.22770004517953252, 0.34949574661733623]); (generated_y_t = [0, 0, 0, 1, 1, 1, 15, 34, 75, 190  …  131, 242, 114, 110, 107, 76, 74, 84, 93, 152], I_t = [0.5537373370780555, 0.31343433667823034, 0.8426493555886151, 1.7145502795510532, 1.9173394756937108, 6.017168880276497, 8.774411741981993, 21.470296891029562, 91.30795693719747, 200.50606178319674  …  164.65638411361397, 237.87572478187798, 119.18921944471977, 158.4274526401128, 118.4952203722337, 77.94327354811475, 107.26094830696798, 101.99188364808721, 103.04341201835673, 129.76006200876262], Z_t = [-1.0375251111876458, -1.3460209372233745, -0.09903111133316478, 0.8570919138311954, 1.0724744354114413, 2.016338883227829, 1.9709983617933697, 2.278647937586527, 3.081427178569351, 3.0305108335758515  …  -1.240855419774909, -0.7389606016238437, -1.2193804482715818, -0.6907295850617376, -0.7448312135590128, -0.949197493422504, -0.4341437965072955, -0.3148754543389267, -0.14950719972193055, 0.17357112871682503]);;])

Results and Predictive plotting

To assess the quality of the inference visually we can plot predictive quantiles for generated case data from the version of the model which hasn't conditioned on case data using posterior parameters inferred from the version conditioned on observed data. For this purpose, we add a generated_quantiles utility function. This kind of visualisation is known as posterior predictive checking, and is a useful diagnostic tool for Bayesian inference (see here).

We also plot the inferred $R_t$ estimates from the model. We find that the EpiAware model recovers the main finding in Mishra et al; that the $R_t$ in South Korea peaked at a very high value ($R_t \sim 10$ at peak) before rapidly dropping below 1 in early March 2020.

Note that, in reality, the peak $R_t$ found here and in Mishra et al is unrealistically high, this might be due to a combination of:

A mis-estimated generation interval/serial interval distribution.
An ascertainment rate that was, in reality, changing over time.

In a future note, we'll demonstrate having a time-varying ascertainment rate.

function generated_quantiles(gens, quantity, qs; transformation = x -> x)
    mapreduce(hcat, gens) do gen #loop over sampled generated quantities
        getfield(gen, quantity) |> transformation
    end |> mat -> mapreduce(hcat, qs) do q #Loop over matrix row to condense into qs
        map(eachrow(mat)) do row
            if any(ismissing, row)
                return missing
            else
                quantile(row, q)
            end
        end
    end
end

generated_quantiles (generic function with 1 method)

let
    C = south_korea_data.y_t
    D = south_korea_data.dates

    #Case unconditional model for posterior predictive sampling
    mdl_unconditional = generate_epiaware(epi_prob,
        (y_t = fill(missing, length(C)),)
    ) | (var"obs.cluster_factor" = fixed_cluster_factor,)
    posterior_gens = generated_quantities(mdl_unconditional, inference_results.samples)

    #plotting quantiles
    qs = [0.025, 0.25, 0.5, 0.75, 0.975]

    #Prediction quantiles
    predicted_y_t = generated_quantiles(posterior_gens, :generated_y_t, qs)
    predicted_R_t = generated_quantiles(
        posterior_gens, :Z_t, qs; transformation = x -> exp.(x))

    ts = D .|> d -> d - minimum(D) .|> d -> d.value + 1
    t_ticks = string.(D)
    fig = Figure()
    ax1 = Axis(fig[1, 1];
        ylabel = "Daily cases",
        xticks = (ts[1:14:end], t_ticks[1:14:end]),
        title = "Posterior predictive: Cases"
    )
    ax2 = Axis(fig[2, 1];
        yscale = log10,
        title = "Prediction: Reproduction number",
        xticks = (ts[1:14:end], t_ticks[1:14:end])
    )
    linkxaxes!(ax1, ax2)

    lines!(ax1, ts, predicted_y_t[:, 3];
        color = :purple,
        linewidth = 2,
        label = "Post. median"
    )
    band!(ax1, 1:size(predicted_y_t, 1), predicted_y_t[:, 2], predicted_y_t[:, 4];
        color = (:purple, 0.4),
        label = "50%"
    )
    band!(ax1, 1:size(predicted_y_t, 1), predicted_y_t[:, 1], predicted_y_t[:, 5];
        color = (:purple, 0.2),
        label = "95%"
    )
    scatter!(ax1, C;
        color = :black,
        label = "Actual cases")
    axislegend(ax1)

    lines!(ax2, ts, predicted_R_t[:, 3];
        color = :green,
        linewidth = 2,
        label = "Post. median"
    )
    band!(ax2, 1:size(predicted_R_t, 1), predicted_R_t[:, 2], predicted_R_t[:, 4];
        color = (:green, 0.4),
        label = "50%"
    )
    band!(ax2, 1:size(predicted_R_t, 1), predicted_R_t[:, 1], predicted_R_t[:, 5];
        color = (:green, 0.2),
        label = "95%"
    )
    axislegend(ax2)

    fig
end

Parameter inference

We can interrogate the sampled chains directly from the samples field of the inference_results object.

let
    sub_chn = inference_results.samples[inference_results.samples.name_map.parameters[[1:5;
                                                                                       end]]]
    fig = pairplot(sub_chn)
    lines!(fig[1, 1], ar.init_prior.v[1], label = "Prior")
    lines!(fig[2, 2], ar.init_prior.v[2], label = "Prior")
    lines!(fig[3, 3], ar.damp_prior.v[1], label = "Prior")
    lines!(fig[4, 4], ar.damp_prior.v[2], label = "Prior")
    lines!(fig[5, 5], ar.ϵ_t.std_prior, label = "Prior")
    lines!(fig[6, 6], epi.initialisation_prior, label = "Prior")

    fig
end

Example: Early COVID-19 case data in South Korea

Dependencies for this notebook

Load early SARS-2 case data for South Korea

Time-varying reproduction number as an AbstractLatentModel type

Turing model interface to the AR process

The Renewal model as an AbstractEpiModel type

Turing model interface to Renewal process

Negative Binomial Observations as an ObservationModel type

Turing model interface to the NegativeBinomialError model

Composing models into an EpiProblem