Simulation workflows for disease progress curves

library(epifitter)
library(ggplot2)
library(cowplot)
theme_set(cowplot::theme_half_open(font_size = 12))

Overview

The sim_ family creates synthetic disease progress curves that match the same model families used by the fitting functions.

Simulations are useful for teaching, testing analysis workflows, checking whether a model can recover known parameters, and generating reproducible examples. They are not a substitute for biological validation. Choose parameter values that make sense for the host-pathogen system, disease metric, and time scale.

Simulate four canonical curve shapes

The four model families represent different epidemic shapes. Exponential growth is most appropriate for early unconstrained increase, while monomolecular, logistic, and Gompertz curves include different forms of deceleration as disease approaches an upper limit.

exp_model <- sim_exponential(N = 100, y0 = 0.01, dt = 10, r = 0.045, alpha = 0.2, n = 5)
mono_model <- sim_monomolecular(N = 100, y0 = 0.01, dt = 5, r = 0.05, alpha = 0.2, n = 5)
log_model <- sim_logistic(N = 100, y0 = 0.01, dt = 5, r = 0.10, alpha = 0.2, n = 5)
gomp_model <- sim_gompertz(N = 100, y0 = 0.01, dt = 5, r = 0.07, alpha = 0.2, n = 5)
exp_plot <- ggplot(exp_model, aes(time, y)) +
  geom_jitter(aes(y = random_y), width = 0.1, color = "#6c757d") +
  geom_line(color = "#b56576", linewidth = 0.8) +
  labs(title = "Exponential")

mono_plot <- ggplot(mono_model, aes(time, y)) +
  geom_jitter(aes(y = random_y), width = 0.1, color = "#6c757d") +
  geom_line(color = "#588157", linewidth = 0.8) +
  labs(title = "Monomolecular")

log_plot <- ggplot(log_model, aes(time, y)) +
  geom_jitter(aes(y = random_y), width = 0.1, color = "#6c757d") +
  geom_line(color = "#355070", linewidth = 0.8) +
  labs(title = "Logistic")

gomp_plot <- ggplot(gomp_model, aes(time, y)) +
  geom_jitter(aes(y = random_y), width = 0.1, color = "#6c757d") +
  geom_line(color = "#8d5a97", linewidth = 0.8) +
  labs(title = "Gompertz")
plot_grid(exp_plot, mono_plot, log_plot, gomp_plot, ncol = 2)

Grid of four simulated disease progress curves showing exponential, monomolecular, logistic, and Gompertz shapes.

Send simulated data into the fitting pipeline

fit_from_sim <- fit_lin(time = log_model$time, y = log_model$random_y)
fit_from_sim$stats_all
## # A tibble: 4 × 14
##   best_model model      r    r_se r_ci_lwr r_ci_upr    v0  v0_se r_squared   RSE
##        <int> <chr>  <dbl>   <dbl>    <dbl>    <dbl> <dbl>  <dbl>     <dbl> <dbl>
## 1          1 Logi… 0.100  6.27e-4   0.0989   0.101  -4.59 0.0366     0.996 0.194
## 2          2 Gomp… 0.0717 1.45e-3   0.0688   0.0746 -2.38 0.0845     0.960 0.449
## 3          3 Mono… 0.0554 1.97e-3   0.0515   0.0593 -1.08 0.115      0.884 0.612
## 4          4 Expo… 0.0448 1.97e-3   0.0409   0.0487 -3.51 0.115      0.833 0.613
## # ℹ 4 more variables: CCC <dbl>, y0 <dbl>, y0_ci_lwr <dbl>, y0_ci_upr <dbl>

When using simulations for method checking, compare the fitted model with the known data-generating model. When using simulations for teaching, show both the underlying curve (y) and the noisy observations (random_y) so users can distinguish the epidemic process from the observation process.