McMasterPandemic is a modelling tool that was rapidly developed to provide timely insights into the Covid-19 Pandemic in Canada. We are currently refactoring this tool so that it is faster and more general. This refactoring will be done incrementally by implementing a series of versioned specifications that define each step. This document outlines these specifications, and provides a roadmap for planning future specifications.
We are using semantic versioning https://semver.org/ to provide versions for the specs. All versions of a spec that have not been implemented in a released version of the McMasterPandemic R package have major version 0 (e.g. 0.x.y, https://semver.org#spec-item-4) – currently all of them. Although none of these specs are released, they are being experimentally implemented on the TMB branch of McMasterPandemic, with R-side code here. There is a global option, MP_flex_spec_version
, that is used by this R
code to control what spec version is being assumed by the TMB
/C++
code. This helps in testing as well. C++
files that implement various spec versions are currently being placed here.
Spec versions have three life cycle stages: frozen, experimental, recommended, and archived. If a frozen version needs clarification or updating, this clarification must be done by linking to a subsequent version that addresses the issue(s). Each frozen version of the spec must contain the following five sections.
R
for model specificationR
to TMB
/C++
that stores the model structureThe specs for the following versions are frozen, which means that they will not change.
C++
sideThe specs for the following versions are being actively developed and modified. Note that version numbers of experimental versions can change without warning.
The specs for archived versions are frozen but no longer respected by the McMasterPandemic
R
package and/or its test framework. Archived versions cannot be ignored because Frozen and Experimental versions may still reference them.
Log-linear time-variation of parameters. This feature generalizes the piece-wise time-variation of parameters to allow for the following.
This is a
Such parameters have their own parameters that determine a model of time-variation. These latter parameters get transformed to produce a time series of parameter values of length equal to the number of time steps. One simple thing to do would be to put these time series into the parameter vector, and when parameters are accessed use the index spi + t
where spi
is the index into the state_param
vector that picks out the first parameter in the time series and t
is the current time step.
\[ \log(u) = \log(u_0) + Xv \] where \(u\) is a vector of length \(T\), the number of time steps, \(u_0\) is a scalar parameter, \(v\) is a vector of length \(D < T\) that parameterizes the model of time variation, and \(X\) is a \(T\)-by-\(D\) matrix of constant columns (e.g. indicator variables, spline bases). Note that logs of vectors are taken element-wise.
\[ \log(u_t) = \log(u_0) + \sum_{i=1}^Dx_{ti}v_i \] \[ u_t = u_0 \prod_{i=1}^D\exp\left(x_{ti}v_i\right) \] For example:
In this case we have the following time variation of \(u\). \[ u_t = \begin{cases} u_0, & t < t_1 \\ \frac{u_0}{2}, & t_1 \le t < t_2 \\ \frac{u_0}{4}, & t \ge t_2 \end{cases} \]
Lag-n differencing with variable lag. This will allow us to difference between time points with observed data, which is necessary when the frequency of data collection changes and other cases with uneven time-series.
One way to frame this problem is to convert the scalar integer, \(n\), into a vector of length equal to the number of simuluation iterations. Let \(n_t\) be the differencing lag at time \(t\), and \(x_t\) and \(y_t\) be the input and output variables of the differencing at this time.
\[ y_t = x_t - x_{t-n_t} \]
If \(n_t = n\) for all \(t\), then we have the special case of lag-n differencing in spec version 0.2.0.
For all times where \(n_t = 0\), we will get \(y_t = 0\) as well. This corresponds for example to delays in the reporting of incidence, where there are periods of several days over which we do not have any new reports – typically not because there are no new cases but because they have not yet been reported.
Consider a case where the frequency of reporting changes from \(n = 1\) to \(n = 7\) (daily to weekly) at a particular time, \(\tau\). The following example takes \(\tau = 101\)
## t n x y
## 1 98 1 295 NA
## 2 99 1 300 5
## 3 100 1 310 10
## 4 101 0 320 0
## 5 102 0 330 0
## 6 103 0 340 0
## 7 104 0 350 0
## 8 105 0 360 0
## 9 106 0 370 0
## 10 107 7 400 90
## 11 108 0 410 0
## 12 109 0 420 0
## 13 110 0 430 0
## 14 111 0 440 0
## 15 112 0 450 0
## 16 113 0 460 0
## 17 114 0 500 100
lag_diff_sri
remains unchanged from previous versions.
lag_diff_delay_n
is changed from a vector to a matrix, with rows associated with the elements of lag_diff_sri
one column for each iterations.
Condensation and calibration on the C++
side.
Condensation is the process of summarizing the concatenated_state_vector
so that it can be compared with available data. This comparison is done quantitatively through a negative log-likelihood function that measures how different the condensed concatenated_state_vector
is from the data. The user can add any of the following variables to the condensed dataset.
state
and stored in concatenated_state_vector
sp
but not storedratemat
and stored in concatenated_ratemat_nonzeros
The following are computed in classic MacPan.
hosp
death
report
The convolution is based on a gamma-density kernel.
The shape parameter of this gamma density depends on params['c_delay_cv']
. \[
\text{shape} = \frac{1}{\text{c_delay_cv}^2}
\] The scale parameter depends on params['c_delay_mean']
and the shape \[
\text{scale} = \text{(c_delay_mean)} \text{(c_delay_cv)}^2
\] Compute the vector \(\gamma\) using the TMB pgamma
function, which takes a vector q
and scalar shape
and scale
arguments. \[
\gamma = \text{pgamma}(q, \text{shape}, \text{scale})
\]
where \(q\) is a constant integer vector that gets created in the following way in R.
= ceiling(qgamma(0.95, shape, scale))
qmax = 1:qmax q
We will avoid computing qmax
in C++ because it could introduce a discontinuity that could cause automatic differentiation to fail. This difference between refactored and classic MacPan only has the potential to cause numerical differences whenever params['c_delay_cv']
and params['c_delay_mean']
are varying – typically because they are in opt_pars
– but I have not yet seen any examples where this is the case. To guarantee matched results at least for the initial parameter vector we will set qmax
as follows on the R-side and pass it through a DATA_
macro.
= ceiling(qgamma(0.95, 1/params[["c_delay_cv"]]^2, params[["c_delay_mean"]] * params[["c_delay_cv"]]^2))` qmax
The delay kernel for the convolution is a numeric vector given by the following expression. \[ \kappa_i = (\text{c_prop})\frac{\delta_i}{\sum_j\delta_j} \] where \(\delta_j = \gamma_{j+1} - \gamma_j\) and \(\text{c_prop}\) is a scalar in the parameter vector.
The convolution, \(z\), of the time series of any state variable, \(x\), is given by the following. \[ z_{i+m-1} = \sum_{j = 1}^m \kappa_{m-j+1} x_{i+j-1} \] where \(i = 1,...,n-m\), \(m\) is the length of \(\kappa\), and \(n\) is the number of time steps (length of the input vector \(x\)). Note that the first \(m-1\) elements of \(z\) remain undefined and so they do not need to be stored on the C++-side because these \(m-1\) missing values can be added on the R-side.
In Progress: in collaboration with Weiguang
Condensation is the process of summarizing the history of the simulations. Before solving the condensation problem we should refactor how this history is currently summarized. Currently the concatenated_state_vector
and concatenated_ratemat_nonzeros
contain some history information, but it will be awkward to continue with this strategy. Instead we will introduce a matrix called simulation_history
with rows corresponding to simulation steps and columns corresponding to the following (in the following order).
Items 1-4 are computed at simulation time, and simply need to be added to the simulation_history
matrix at each iteration.
Items 5-7 should be computed at the end of the simulation. These columns should be computed in the same order as they will appear in the simulation_history
matrix. The input for these computations (5-7) will be stored in the following new data structures.
sr_count
– vector the same length as the number of columns added in step 3a giving the number of simulation_history
columns to use as input for each new column (note that input columns can be used more than once – see count
vector for a similar structure)sri
– indices into the columns of simulation_history
to use as input (see spi
vector for a similar structure)sr_modifier
– bitwise integer encoding of the parse tree (see modifier
for a similar data structure)lag_diff_sri
– indices into the columns of simulation_history
to use as input to lag-n difference operationslag_diff_delay_n
– vector same length as lag_diff_sri
containing the delay (i.e. the “n” in lag-n) of the lag for each lag-n difference operationconv_sri
– indices into the columns of simulation_history
to use as input to convolution operationsconv_c_prop_idx
, conv_c_delay_cv_idx
, conv_c_mean_cv_idx
– three vectors each the same length as conv_sri
giving indices into params
for extracting the parameters of the convolutionsconv_qmax
– vector the same length as conv_sri
giving the maximum integer in the q
vector for each convolutionAdd simulation_history
to the report. On the R-side we could start using simulation_history
where we currently use concatenated_*
vectors, and eventually we could drop them.
We compare the negative binomial negative log likelihood with observed_vector
as data and condensed_state_vector
as predictions. Apologies that the word ‘negative’ is being used in two different ways – the first refers to the negative binomial distribution, and the second to the fact that we are multiplying the log likelihood by negative one. In R we would write this as follows.
-1 * dnbinom(observed_vector, mu = condensed_state_vector, size = nb_disp, log = TRUE)
In TMB, this negative binomial density function should be used for this (I think … how does size
on the R-side correspond to var
in TMB?).
The first three arguments to this function are vectors of the same size.
observed_vector
is an observed data stream that is provided by the usercondensed_state_vector
is described in detail abovenb_disp
is a vector of dispersion parameters, which is composed of values passed in through PARAMETER
macros (see below for a description of how nb_disp
is constructed)The sum of these negative log likelihood values gives a loss function to be returned to the user.
TODO: how to handle clamping when we leave the support of the density? concerned about discontinuities.
We require an additional set of parameters to model negative binomial dispersion, nb_disp
. The user has two options.
nb_disp
condensed_state_vector
The user is allowed to specify a prior distribution for any element of params
, tv_mult
, or nb_disp
. Initially only univariate normal prior distributions will be allowed, but this should be built to be extensible allowing for other distributions to be added in the future.
For each parameter with a prior, the user specifies the distribution (initially normal
will be the only choice), and a parameter vector for the arguments of the corresponding density function. At each evaluation of the optimizer the log of the prior densities are subtracted from the negative log-likelihood function.
The TMB normal density function should be used. We should be extensible to allow future developers to add other distributions from here.
The user is allowed to express any value of params
or tv_mult
on one of several transformed scales. The corresponding inverse transformation is applied immediately as the parameter is read through the appropriate macro on the TMB-side. The user should be allowed to specify one of the following transformations.
log10
log
logit
The corresponding inverse transformations that would be applied after the macros are the following.
10^x
exp(x)
1/(1+exp(-x))
Let \(l_{\psi_i}\left(x_i(\theta); y_i\right)\) be a loss function where \(x_i(\theta)\) is a simulated element of the history matrix, \(y_i\) is an associated observed data point, and \(\psi_i\) is a vector of loss function parameters. Note that we write the simulation values as functions of \(\theta\) to indicate their dependence on parameters. Every element \(i\) that belongs to the same column of the history matrix has the same value for \(\psi_i\), but to simplify notation \(\psi\) is sub-scripted by \(i\). Let \(r_{\phi_j}(\theta_j)\) be an optional regularization function for each parameter \(\theta_j\) being optimized, where \(\phi_j\) is a vector of regularization function parameters associated with parameter \(j\). The objective function is then given by the following.
\[ L_{\psi, \phi}(\theta; y) = \sum_i l_{\psi_i}\left(x_i(\theta); y_i\right) + \sum_j r_{\phi_j}(\theta_j) \] Currently the only objective function that is offered is the negative log negative binomial density, but the infrastructure we build assumes that other loss functions will be allowed in the future. Similarly the only regularizing function that is available is the negative log normal density.
Each parameter, \(\theta_j\), can be optionally passed into the objective function on a transformed scale. These transformations allow users to enforce limits on the domain of the parameter space (e.g. a logit transformation will keep parameter values between 0 and 1). The regularization functions are applied on the transformed scale, but the elements of the rate matrix (and therefore the simulations, \(x\)) are computed using the parameters on their original scale. Therefore, we have the following ordering of operations.
c++
params
vectorc++
function objective_function
dnbinom2
to compute the log negative binomial densitydnbinom2
can be computed as follows:
x
– observed datamu
– simulated datavar
– mu + mu^2 / disp
, where disp
is the dispersion parametergive_log
– 1
mu
argument, the user may optionally choose to smoothly constrain the simulated data to be greater than a lower bound, eps
, by applying the function mu_constrained = mu + eps*exp(-mu/eps)
C++
: flat
C++
: normal
x
– \(f(\theta)\) – value of the parameter on the transformed scalemean
– \(\mu_{f(\theta)}\) – first regularization parametersd
– \(\sigma_{f(\theta)}\) – second regularization parametergive_log
` – 1The first regularization parameter for every regularization function is used as the starting value for the optimizer on the scale of the transformed parameter.
identity
log
log10
logit
cloglog
inverse
We will pass the following additional data through to TMB to specify the objective function.
obs_var_id
– id giving the column in the history matrix associated with each variable in the objective functionobs_loss_id
– identifier of the loss function being used for each variable (currently only one – negative log negative binomial density)obs_loss_param_count
– number of loss function parameters associated with each variableobs_spi_loss_param
– index into sp containing the loss function parameters. the number of such parameters is sum(obs_loss_param_count)obs_time_step
– row in the history matrixobs_history_col_id
– column in the history matrix (match with obs_var_id
above to determine the obs_loss_id
and obs_loss_param_count
)obs_value
– value observed in data to compare with the associated element in the history matrixobs_do_sim_constraint
– Boolean determining if the lower bound in obs_sim_lower_bound
should be applied to the simulated data with smooth constraint function mu_constrained = mu + eps*exp(-mu/eps)
obs_sim_lower_bound
– lower bound applied if obs_do_sim_constraint
is trueopt_param_id
– indices into the params
vector giving the parameters to be optimizedopt_trans_id
– index identifying a transformation – see above for correspondence between indices and transformationsopt_count_reg_params
– number of regularization parameters associated with each parameter to be optimizedopt_reg_family_id
– index identifying a family of regularization functions – currently only two are available (flat
, normal
)tv_mult
opt_tv_param_id
– indices into the tv_mult
vector giving the time varying multipliers to be optimizedopt_tv_trans_id
– index identifying a transformation – see above for correspondence between indices and transformationsopt_tv_count_reg_params
– number of regularization parameters associated with each parameter to be optimizedopt_tv_reg_family_id
– index identifying a family of regularization functions – currently only two are available (flat
, normal
)opt_reg_params
– regularization parameters – the size of this vector is opt_count_reg_params.sum()
opt_tv_reg_params
– regularization parameters for tv_mult
– the size of this vector is opt_tv_count_reg_params.sum()
-trans}_{parameter} ~ {reg-trans}_{baseline-prior-family}({hyperparameter1}, ...) {opt
-trans}_{parameter} ~ {reg-trans}_{time-varying-prior-family}({hyperparameter-vector1}, ...) {opt
Save common factors when building rate expressions
Expressions involving sums and products of state variables and parameters, as well as their complements and inverses, can be saved in a vector, \(\phi\). The elements of \(\phi\) can then be used in expressions that define elements of the rate matrix.
Four new integer vectors:
factr_spi
– 1-based index into the elements of the sp
vector that will contain the common factorfactr_count
– vector the same length as factr_spi
containing the number of elements in sp
used to compute the common factorfactr_spi_compute
– vector with sum(factr_count)
elements giving the indices into the elements of the sp
vector for extracting the elements used to compute the new factorfactr_modifier
– vector with sum(factr_count)
elements describing the operations to be performed for generating the common factor (see v0.0.1 for details on modifier vectors)This is a very big ‘patch’. The motivation for it is to get a first release that is potentially useful to PHAC. This requires three enhancements.
tv_mult
to the parameters so that we can calibrate them if we need to do soLet \(f_i^{(out)}\) be the \(i\)th element of the outflow vector, which is computed as follows. \[ f_i^{(out)} = \sum_{j \in \Omega_i}F_{ij} \]
Where \(\Omega_i\) is a subset of the indices into the state vector. Typically there are only a small number of unique index sets, with one \(\Omega_i\) for many values of \(i\) – for this reason the data structure for these indices only need to track each unique index set.
We can express the previous definition of the outflow vector (TODO: link to previous definition) in the terms of this generalization as a constant \(\Omega = \Omega_i, \forall i\).
In previous spec versions, the C++
side needs to get the initial state vector directly from R. The difficulty with this strategy is that the initial state vector depends on the parameter vector, so when a new parameter vector gets updated in an optimization or ensemble run it becomes necessary to drop back to R causing a substantial performance bottleneck. So here we add the ability to create the initial state vector from the parameter vector in C++
.
What follows is a detailed description of the procedure, which makes reference to new items in the data structure (see Data Structure below).
state
– initializing state vector for the full non-linearized modellin_state
– initializing state vector for the linearized modellin_params
) of the parameter vector (params
)lin_params
with user supplied constant values (lin_param_vals
, lin_param_count
, lin_param_idx
– these can be found in model$tmb_indices$linearized_params
)lin_state
with elements of lin_params
(df_state_par_idx
, df_state_count
, df_state_idx
– these can be found in model$tmb_indices$disease_free
)jac
), which requires the following steps in a function that takes lin_state
and returns an updated lin_state
lin_ratemat
) from lin_state
and lin_params
using the existing make_ratemat_indices
lin_flow
) from lin_state
and lin_ratemat
lin_inflow
) from lin_flow
by taking column sumslin_outflow
) from lin_flow
by the method above – Generalization of Outflow Vector (linearized_outflow_row_count
, linearized_outflow_col_count
, linearized_outflow_rows
, linearized_outflow_cols
)lin_state
using lin_inflow
and lin_outflow
jac
and lin_state
to create a new vector eigvec
(all_drop_eigen_idx
– can be found in model$tmb_indices$initialization_mapping
)eigvec
of jac
eigvec
to create eig_infected
(eigen_drop_infected_idx
– can be found in model$tmb_indices$initialization_mapping
)eig_infected
to have elements that sum to onestate[all_to_infected_idx] = eig_infected * param[infected_idx]
(infected_idx
can be found in model$tmb_indices$initial_population
)state[susceptible_idx] = (1/susceptible_idx.size()) * (param[total_idx] - param[infected_idx])
(infected_idx
and total_idx
can be found in model$tmb_indices$initial_population
)Objects of class flexmodel
contains a new element called tmb_indices$outflow
, with the following elements
row_count
–col_count
–rows
–cols
–These indices encode information in the \(\omega_i\) index sets (see Mathematical Description above).
This outflow
item means that par_accum_indices
is no longer required. Given that we do not need to worry about back compatibility yet, it will be dropped from this and all future spec versions.
Objects of class flexmodel
contains the following new elements.
linearized_params
– indices to update some elements of lin_param
before computing the Jacobian of the linearized model
lin_param_vals
– values used to fill the elements of lin_param
lin_param_count
– the number of elements of lin_param
that will be updated with each value in lin_param_vals
lin_param_idx
– indices of lin_param
to updatedisease_free
– indices to initialize lin_state
to be nearly disease-free, by setting some of its elements equal to some elements of lin_param
df_state_par_idx
– indices of lin_param
used to fill the elements of lin_state
df_state_count
– the number of elements of lin_state
that will be updated with each parameter identified by df_state_par_ix
df_state_idx
– indices of lin_state
to updatelinearized_outflow
– same as outflow
but for the linearized modelinitialization_mapping
– indices into various subsets of the state vector and eigenvector that could be useful for initializing the state – see justification below this list
all_to_eigen_idx
– indices into state
and lin_state
for identifying states in eigvec
all_to_infected_idx
– indices into state
and lin_state
for identifying states in eig_infected
eigen_to_infected_idx
– indices into eigvec
for identifying states in eig_infected
all_drop_eigen_idx
– indices into state
and lin_state
for identifying states not in eigvec
all_drop_infected_idx
– indices into state
and lin_state
for identifying states not in eig_infected
eigen_drop_infected_idx
– indices into eigvec
for identifying states not in eig_infected
susceptible_idx
– indices into state
and lin_state
for identifying states associated with the initial susceptible populationinitial_population
– indices into the parameter vector for identifying the total number of individuals in the population and the initial total number of infected individuals in the population
total_idx
infected_idx
The justification for the initialization_mapping
is given here. One potential source of confusion in the above process is the need to keep track of the different subsets of states. To help clarify this point, there are conceptually three types of state vectors containing nested subsets of states. 1. all_states
– vectors containing all states 2. eigen_states
– vectors containing all states of the linearized system that are included in the dominant eigenvector 3. infected_states
– vectors containing all infected states, which are eigen_states
that will be initialized using corresponding elements in the eigenvector
To make life a little easier for development on the C++
side we provide index vectors for converting between the three nested sets of states. 1. all_to_eigen_idx
2. all_to_infected_idx
3. eigen_to_infected_idx
And for completeness, in case it is useful, we also provide index vectors for dropping states from each kind of state vector. 1. all_drop_eigen_idx
2. all_drop_infected_idx
3. eigen_drop_infected_idx
We bump major versions here to start working on model structure (e.g. vaccination, age, testing). We make two enhancements:
struc
object class that allows for symbolic manipulation of matrices and vectors of parameters and state variables when defining rate matrix elementsWithin each simulation iteration, a user-defined number of sums of state variables and parameters are computed and stored immediately before the rate matrix is updated.
As an example, the force of infection under the two-dose vaccination model can be given by the following struc
objects.
Start by getting a set of vaccination parameters and state variables.
Create a symbolic vector of I-state variables, with 4 base states by 5 vaccination categories = 20 variables.
<- read_params("PHAC.csv")
base_params <- expand_params_vax(
vax_params params = base_params,
model_type = "twodose"
)= (McMasterPandemic:::expand_names(
Istate c('Ia', 'Ip', 'Im', 'Is'), # = Icats
attr(vax_params, 'vax_cat')) # = vax_cats
%>% struc
)as.matrix(Istate)
## [,1]
## [1,] "(Ia_unvax)"
## [2,] "(Ip_unvax)"
## [3,] "(Im_unvax)"
## [4,] "(Is_unvax)"
## [5,] "(Ia_vaxdose1)"
## [6,] "(Ip_vaxdose1)"
## [7,] "(Im_vaxdose1)"
## [8,] "(Is_vaxdose1)"
## [9,] "(Ia_vaxprotect1)"
## [10,] "(Ip_vaxprotect1)"
## [11,] "(Im_vaxprotect1)"
## [12,] "(Is_vaxprotect1)"
## [13,] "(Ia_vaxdose2)"
## [14,] "(Ip_vaxdose2)"
## [15,] "(Im_vaxdose2)"
## [16,] "(Is_vaxdose2)"
## [17,] "(Ia_vaxprotect2)"
## [18,] "(Ip_vaxprotect2)"
## [19,] "(Im_vaxprotect2)"
## [20,] "(Is_vaxprotect2)"
Create a vector of base transmission rates associated with each of the 4 base I-states.
=
baseline_trans_rates struc(
'Ca',
'Cp',
'(1 - iso_m) * (Cm)',
'(1 - iso_s) * (Cs)') *
struc('(beta0) * (1/N)')
as.matrix(baseline_trans_rates)
## [,1]
## [1,] "(Ca) * (beta0) * (1/N)"
## [2,] "(Cp) * (beta0) * (1/N)"
## [3,] "(1 - iso_m) * (Cm) * (beta0) * (1/N)"
## [4,] "(1 - iso_s) * (Cs) * (beta0) * (1/N)"
Create a 5-by-5 block matrix of vaccine efficacies
= struc_block(struc(
vax_trans_red '1',
'1',
'(1 - vax_efficacy_dose1)',
'(1 - vax_efficacy_dose1)',
'(1 - vax_efficacy_dose2)'),
row_times = 1, col_times = 5)
as.matrix(vax_trans_red)
## [,1] [,2]
## [1,] "(1)" "(1)"
## [2,] "(1)" "(1)"
## [3,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [4,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [5,] "(1 - vax_efficacy_dose2)" "(1 - vax_efficacy_dose2)"
## [,3] [,4]
## [1,] "(1)" "(1)"
## [2,] "(1)" "(1)"
## [3,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [4,] "(1 - vax_efficacy_dose1)" "(1 - vax_efficacy_dose1)"
## [5,] "(1 - vax_efficacy_dose2)" "(1 - vax_efficacy_dose2)"
## [,5]
## [1,] "(1)"
## [2,] "(1)"
## [3,] "(1 - vax_efficacy_dose1)"
## [4,] "(1 - vax_efficacy_dose1)"
## [5,] "(1 - vax_efficacy_dose2)"
And finally the force of infection vector, which looks like we really need to introduce the capability for common factors.
= kronecker(vax_trans_red, t(baseline_trans_rates)) %*% Istate
foi foi
## struc object with 5 rows and 1 column:
##
## Row 1
## (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 2
## (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 3
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 4
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (1 - vax_efficacy_dose1) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - vax_efficacy_dose1) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
## Row 5
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_unvax) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_unvax) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_unvax) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_unvax) +
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose1) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose1) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose1) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose1) +
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect1) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect1) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect1) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect1) +
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxdose2) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxdose2) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxdose2) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxdose2) +
## (1 - vax_efficacy_dose2) * (Ca) * (beta0) * (1/N) * (Ia_vaxprotect2) +
## (1 - vax_efficacy_dose2) * (Cp) * (beta0) * (1/N) * (Ip_vaxprotect2) +
## (1 - vax_efficacy_dose2) * (1 - iso_m) * (Cm) * (beta0) * (1/N) * (Im_vaxprotect2) +
## (1 - vax_efficacy_dose2) * (1 - iso_s) * (Cs) * (beta0) * (1/N) * (Is_vaxprotect2)
Start with a basic example model.
<- read_params("ICU1.csv")
params <- make_state(params = params)
state <- McMasterPandemic::make_ratemat(state, params, sparse = TRUE)
M <- flexmodel(
test_model
params, state,start_date = "2021-09-09", end_date = "2021-10-09"
)
Then we define sums of state variables and parameter vectors. Here we add the sum of the I
states and of the ICU
states using regular expressions and just a list of state names.
= (test_model
test_model %>% add_state_param_sum(sum = "Isum", summands = "^I[a-z]")
%>% add_state_param_sum(sum = "ICUsum", summands = c("ICUs", "ICUd"))
)
Then we can just proceed as we normally would, but with the option to refer to these sums (this is not meant to reflect real epidemiology, but to illustrate the interface).
options(MP_flex_spec_version = "0.0.2")
= (test_model
test_model %>% add_rate("E", "Ia", ~ (alpha) * (sigma) * (1 / Isum))
%>% add_rate("E", "Ip", ~ (1 - alpha) * (sigma))
%>% add_rate("Ia", "R", ~ (gamma_a))
%>% add_rate("Ip", "Im", ~ (mu) * (gamma_p))
%>% add_rate("Ip", "Is", ~ (1 - mu) * (gamma_p))
%>% add_rate("Im", "R", ~ (gamma_m))
%>% add_rate("Is", "H", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("Is", "ICUs", ~ (1 - nonhosp_mort) * (1 - phi1) * (1 - phi2) * (gamma_s))
%>% add_rate("Is", "ICUd", ~ (1 - nonhosp_mort) * (1 - phi1) * (phi2) * (gamma_s))
%>% add_rate("Is", "D", ~ (nonhosp_mort) * (gamma_s))
%>% add_rate("ICUs", "H2", ~ (psi1))
%>% add_rate("ICUd", "D", ~ (psi2))
%>% add_rate("H2", "R", ~ (psi3))
%>% add_rate("H", "R", ~ (rho) / (1 / ICUsum))
%>% add_rate("Is", "X", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("S", "E", ~
* (beta0) * (1 / N) * (Ca) +
(Ia) * (beta0) * (1 / N) * (Cp) +
(Ip) * (beta0) * (1 / N) * (Cm) * (1 - iso_m) +
(Im) * (beta0) * (1 / N) * (Cs) * (1 - iso_s))
(Is) %>% add_parallel_accumulators(c("X", "N", "P", "V"))
%>% update_tmb_indices()
)
And here are the indices that are relevant to the tracking of sums of state variables and parameters.
$tmb_indices$sum_indices test_model
## $sumidx
## integer(0)
##
## $sumcount
## integer(0)
##
## $summandidx
## integer(0)
We describe these below.
We introduce three new index/count vectors.
sumidx
– indices, locating the variable to store each sum, into the state_param
vector defined in version 0.0.1sumcount
– vector containing the number of summands required for each sumsummandidx
– indices, locating the summands, into the state_param
vectorUpdates of time-varying parameters from the initial parameters on the C++
side. This is really a patch of 0.0.4, which didn’t work for calibration because it required updating the objective function (or its environment?) at every iteration of the optimizer. It is just easier to do it on the C++
and we need to do this anyways.
C++
to R
so that they can be included in the results of run_sim
Define the sums of the rows of the rate matrix. \[ r_i = \sum_{j=1}^n M_{ij} \] Define the elements of a vector of exponentiated row sums \[ \rho_i = \exp(-r_i) \]
Define the elements of a normalized state vector. \[ \tilde{s}_i = \begin{cases} 0 & r_i = 0 \\ \frac{s_i}{r_i} & \text{otherwise} \end{cases} \] With these definitions we can define the modified flow matrix. \[ F_{ij} = \begin{cases} M_{ij}\tilde{s}_i(1-\rho_i) & i \ne j \\ 0 & \text{otherwise} \end{cases} \] This modified flow matrix can now be used in the same way as the unmodified flow matrix to produce state variable updates following spec version 0.0.2.
A Boolean argument needs to be added to the interface, indicating whether or not the hazard-based flow matrix is used rather than the simple flow matrix.
The data structure is equivalent to 0.0.4.
We begin with simple piece-wise constant time-variation. Piece-wise constant means that each time-varying parameter is associated with a sequence of break-points at which the value of the parameter changes – at all other times the parameter is constant. We plan to relax the piece-wise constant restriction in later versions.
Patch of the spec from 0.0.3 on piece-wise constant time-varying parameters.
This version was an early attempt to introduce parameters that can vary at each simulation step. It was never implemented on the C++
-side, and has been superseded by version 0.0.4
In the non-time-varying case of version 0.0.2, we have scalar valued parameters. These parameters are constant in simulation time. We now consider parameters that can vary at any particular simulation step.
Consider a focal scalar-valued time-varying parameter \(u\). In the non-time-varying case this scalar value, \(u\), is an element of the parameter vector, \(\theta\).
\[ \theta = [..., u, ...] \]
In the time-varying case, there is a series of values, \(u_0, u_1, ..., u_n\) associated with an initial value \(u = u_0\) and the values, \(u_1, ..., u_n\), at \(n_u\) break-points, \(t_1, ..., t_{n_u}\). We store these additional values by expanding the parameter vector to make room for them.
\[ \theta = [..., u_0, u_1, ..., u_n, ...] \]
Each simulation step is indexed \(t = 1, ..., T\), where \(T\) is the number of simulation steps. If the time-varying parameter, \(u\), is required at time \(t\) to update an element of the rate-matrix, the value of \(u\) that is used is \(u_i\) such that \(t \in [t_i, t_{i+1})\).
We follow the params_timevar
argument to run_sim
in McMasterPandemic
, which allows the user to specify a data frame with the following columns.
The definition of these columns at the time of writing is given here.
The user must also specify the starting and ending calendar dates of the simulation.
An example model with time variation can be defined as follows.
options(MP_flex_spec_version = "0.0.3")
= read_params('ICU1.csv')
params = make_state(params = params)
state = (
model flexmodel(
params, state,start_date = '2021-08-26', end_date = "2021-09-26",
timevar_piece_wise = data.frame(
Date = c("2021-09-01", "2021-09-15", "2021-09-10", "2021-08-28"),
Symbol = c("mu", "beta0", "beta0", "mu"),
Value = c(0.5, 0.1, 2, 0.2),
Type = c("rel_prev", "rel_orig", "rel_prev", "rel_orig")
))%>% add_rate("E", "Ia", ~ (alpha) * (sigma))
%>% add_rate("E", "Ip", ~ (1 - alpha) * (sigma))
%>% add_rate("Ia", "R", ~ (gamma_a))
%>% add_rate("Ip", "Im", ~ (mu) * (gamma_p))
%>% add_rate("Ip", "Is", ~ (1 - mu) * (gamma_p))
%>% add_rate("Im", "R", ~ (gamma_m))
%>% add_rate("Is", "H", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("Is", "ICUs", ~
1 - nonhosp_mort) * (1 - phi1) * (1 - phi2) * (gamma_s))
(%>% add_rate("Is", "ICUd", ~
1 - nonhosp_mort) * (1 - phi1) * (phi2) * (gamma_s))
(%>% add_rate("Is", "D", ~ (nonhosp_mort) * (gamma_s))
%>% add_rate("ICUs", "H2", ~ (psi1))
%>% add_rate("ICUd", "D", ~ (psi2))
%>% add_rate("H2", "R", ~ (psi3))
%>% add_rate("H", "R", ~ (rho))
%>% add_rate("Is", "X", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("S", "E", ~
* (beta0) * (1/N) * (Ca) +
(Ia) * (beta0) * (1/N) * (Cp) +
(Ip) * (beta0) * (1/N) * (Cm) * (1-iso_m) +
(Im) * (beta0) * (1/N) * (Cs) * (1-iso_s))
(Is) %>% add_parallel_accumulators(c("X", "N", "P", "V"))
%>% update_tmb_indices()
)
Here we make a distinction between indices and time-indices. An index is a 1-based integer identifying a value within a vector. A time-index is a 1-based integer identifying the simulation step.
tvi_spi
: vector of indices for each time varying factor, identifying the elements of the spi
vector from version 0.0.1n_breaks
: vector of the number of time breaks for each time varying factort_breaks
: vector of time-indices locating all time breaks for all time varying factors, organized such that time-indices associated with the same factor are grouped together and time-indices are all ascendingThese indices and counts can be used to update the spi
vector at each simulation step. Here is an example in R
.
options(MP_flex_spec_version = "0.0.3")
= c(1L, 3L, 6L, 10L, 14L, 19L)
tvi_spi = c(3L, 3L, 2L, 2L, 2L, 2L)
n_breaks = c(
t_breaks # factor 1 has 3 break-points
2L, 6L, 17L, # factor 2 has 3 break-points
2L, 6L, 17L, # factor 3 has 2 break-points
15L, 20L, # factor 4 has 2 break-points
15L, 20L, # factor 5 has 2 break-points
15L, 20L, # factor 6 has 2 break-points
15L, 20L
)
# number of time varying factors
= length(tvi_spi)
n
# initialize a break point counter. each element of t_breaks_t
# is associated with a time varying factor. each time a break
# point is encountered, increment the corresponding element in
# this counter
= rep(0, n)
t_breaks_t
# indices into the state param vector for all factors, including
# those that are not time varying
= c(30L, 27L, 30L, 27L, 3L, 15L, 33L, 18L, 4L, 15L, 33L, 19L, 5L,
spi
15L, 33L, 20L, 36L, 6L, 15L, 33L, 21L, 36L)print(spi)
## [1] 30 27 30 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
# loop over time steps
for(t in 1:30) {
# loop over time-varying factors
for(i in 1:n) {
# only do something if factor i has a break-point at time t
= t == t_breaks[1 + t_breaks_t[i] + sum(n_breaks[1:(i-1)])]
break_now if(isTRUE(break_now)) {
# increment the break-point counter
= t_breaks_t[i] + 1
t_breaks_t[i]
# increment the spi vector for the ith time varying factor
= spi[tvi_spi[i]] + 1
spi[tvi_spi[i]]
}
}print(spi)
# do the rest of what needs to be done in this simulation step
# (e.g. update the rate matrix, update the state vector)
# ...
}
## [1] 30 27 30 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 31 27 31 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 31 27 31 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 31 27 31 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 31 27 31 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 15 33 18 4 15 33 19 5 15 33 20 36 6 15 33 21 36
## [1] 32 27 32 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 32 27 32 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 33 27 33 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 33 27 33 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 33 27 33 27 3 16 33 18 4 16 33 19 5 16 33 20 36 6 16 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
## [1] 33 27 33 27 3 17 33 18 4 17 33 19 5 17 33 20 36 6 17 33 21 36
Each row in this output is a time step and each column is the spi
index for a particular factor. Notice that only six columns change through time, and these are associated with the six time-varying factors. Columns 1 and 3 are associated with the same time-varying parameter, and so they change at the same break-points. The same is true of columns 6, 10, 14, and 19, which are associated with another time-varying parameter.
Iterated updates of the state vector.
do_exponential == FALSE
, and do_hazard == FALSE
)MP_badsum_action = 'ignore'
Let \(s\) be the state vector and \(M\) be the rate matrix. Define the flows matrix, \(F\), to have the same dimensions as \(M\). The elements of \(F\) are given by the following.
\[ F_{ij} = M_{ij} s_i \]
Let \(f_{\text{in}}\) and \(f_{\text{out}}\) be the inflow and outflow vectors, given by the column sums and row sums of \(F\) respectively. If the use specifies some states as being parallel accumulators, then the columns associated with those special states are removed from \(F\) before the row sums are taken to compute \(f_{\text{out}}\).
With these definitions in place the state vector update at each iteration is given by the following.
\[ s \to s - f_{\text{out}} + f_{\text{in}} \] Here is a summary of the stages of a single simulation step under this model.
In addition to the interface elements specified in 0.0.1, the user needs to provide some additional information. We need to provide a character vector of regex patterns with which to find the indices of the parallel accumulators amongst the state names (e.g. X, N, P, V). For example, one way to provide such a vector to an existing model would be the following.
options(MP_flex_spec_version = "0.0.2")
= add_parallel_accumulators(model, c("X", "N", "P", "V")) model
The user must specify the number of state vector updates.
The following pieces of information need to be added to those described in Data Structure (0.0.1) in order to make state vector updates.
Modified copies of the following integer vectors described in 0.0.1: spi
, modifier
, from
, to
, count
A vector, par_accum_indices
of indices into the columns of the flows matrix, identifying the states that are parallel accumulators
The number of state vector updates to simulate
Update the non-zero elements of a rate matrix on the TMB
/C++
side using a restricted set of operations (complements, inverses, sums, and products). The update formula must be specified separately for each non-zero element.
param_pansim
and a state_pansim
object exists or can be constructedmake_state
is used to create the state_pansim
object, vaxify
, ageify
, and testify
are all set to FALSE
Any element, \(x\), of either the parameter or state vector can be used to define a factor in one of the following three forms.
We collect these user-defined factors into a factor vector, \(y\). Factors can be repeated in \(y\) if required. Any number of factors can be multiplied together using *
to produce a product. Any number of factors and products can be added together using +
.
There is a higher level nested structure associated with the factor vector, \(y\).
With these definitions, the dependence of any non-zero rate matrix element on the parameters and state variables is given by the following expression.
\[ M_{(i)} = \sum_{j=1}^{n_i} \prod_{k=1}^{m_{ij}} y_{ijk} \]
where \(y_{ijk}\) is the \(k\)th factor associated with the \(j\)th product associated with the \(i\)th non-zero rate matrix element.
Users can define the structure of a rate matrix with a list of expressions, each determining the parameter-dependence and/or state-dependence of a single non-zero rate matrix element. The standard McMasterPandemic
introductory example model can be defined as follows.
options(MP_flex_spec_version = "0.0.1")
= read_params('ICU1.csv')
params = make_state(params = params)
state = (
model flexmodel(params, state)
%>% add_rate("E", "Ia", ~ (alpha) * (sigma))
%>% add_rate("E", "Ip", ~ (1 - alpha) * (sigma))
%>% add_rate("Ia", "R", ~ (gamma_a))
%>% add_rate("Ip", "Im", ~ (mu) * (gamma_p))
%>% add_rate("Ip", "Is", ~ (1 - mu) * (gamma_p))
%>% add_rate("Im", "R", ~ (gamma_m))
%>% add_rate("Is", "H", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("Is", "ICUs", ~
1 - nonhosp_mort) * (1 - phi1) * (1 - phi2) * (gamma_s))
(%>% add_rate("Is", "ICUd", ~
1 - nonhosp_mort) * (1 - phi1) * (phi2) * (gamma_s))
(%>% add_rate("Is", "D", ~ (nonhosp_mort) * (gamma_s))
%>% add_rate("ICUs", "H2", ~ (psi1))
%>% add_rate("ICUd", "D", ~ (psi2))
%>% add_rate("H2", "R", ~ (psi3))
%>% add_rate("H", "R", ~ (rho))
%>% add_rate("Is", "X", ~ (1 - nonhosp_mort) * (phi1) * (gamma_s))
%>% add_rate("S", "E", ~
* (beta0) * (1/N) * (Ca) +
(Ia) * (beta0) * (1/N) * (Cp) +
(Ip) * (beta0) * (1/N) * (Cm) * (1-iso_m) +
(Im) * (beta0) * (1/N) * (Cs) * (1-iso_m))
(Is) %>% update_tmb_indices()
)
The first step, flexmodel
, in this pipeline initializes the compartmental model by specifying objects containing the parameters and state variables respectively.
Each subsequent call to add_rate
defines the dependence of a single non-zero rate matrix element on parameters and state variables. When used in a pipeline, the add_rate
function takes three arguments.
from
– character string describing the state associated with the row of the rate matrix
to
– character string describing the state associated with the column of the rate matrix
formula
– one-sided model formula defining the dependence based on symbols associated named parameter and state vectors
Any single parameter or state variable name, x
, can be placed in parentheses to produce a factor in the following three ways
(x)
(1-x)
(1/x)
Factors can be multiplied together with the *
operator to produce a product
Products and factors can be added together with the +
operator to produce a non-zero rate matrix element
The final step of the pipeline adds the indices to be passed to TMB
/C++
that are described in the Data Structure section below.
The object created by the pipeline in the User Interface section above must contain at a minimum the following seven objects.
The initial value of a state_pansim
object representing the vector, state
, containing the state variables
The initial value of a param_pansim
object representing the vector, params
, containing the parameters
An integer vector, spi
, of 1-based indices into a concatenation, state_param
, of state
and param
state_param
is arbitrary because several integer vectors defined below contain indices into these elements for book-keeping purposes on the TMB
/C++
sideR
side, spi
has the property that state_param[spi]
returns a vector of state variables and parameters in the order in which they occur in the factor vector defined above in the Mathematical Description sectionstate_param[spi]
does not return the factor vector itself – to get the factor vector from state_param[spi]
you would need to modify some of the elements by taking their complements and inversesAn integer vector, modifier
, of the same length as spi
Each element of modifier
corresponds to a factor, as defined above in the Mathematical Description
These elements bit-wise encode several pieces of information
state_param[spi]
need to transformed by complements or inversesstate_param[spi]
need to be multiplied together to take productsThe binary expansion of these elements can each be represented with three bits, with each bit encoding different information
1
if the corresponding factor is the first in a product that needs to be added to a previous product, and 0
otherwise1
if the corresponding factor requires a complement and 0
otherwise1
if the corresponding factor requires an inverse and 0
otherwiseAn integer vector, from
, of 1-based indices, the \(i\)th element of which points to the row of the \(i\)th non-zero rate matrix element
An integer vector, to
, of 1-based indices, the \(i\)th element of which points to the column of the \(i\)th non-zero rate matrix element
An integer vector, count
, the \(i\)th element of which gives the number of factors used to calculate the \(i\)th non-zero rate matrix element
count
store these sums \(\sum_{j=1}^{n_i}m_{ij}\)