Vector-mediated dispersal kernels

I wrote this post to help a friend a couple of years ago (og post), and I am finally getting to organize this into my brand new website! 🤩

At the time, Sarah was working on her dissertation with salamanders and the fungal pathogen Batrachochytrium salamandrivorans (Bsal). Specifically, we talked about generating dispersal kernels for the pathogen, thinking of humans walking on trails and carrying the pathogen on their boots. Sarah did some really fun experiments getting boots in the mud and taking steps, to quantify the amount of pathogen that would get onto the boots, and then dispersed by a person walking.

When you have available data, the usual goal is to fit an equation (or model) that describes the probability of dispersal as a function of distance. In this case, we are thinking of human mediated dispersal (HMD) of a specific fungal pathogen. The experiments were used to calculate the amount of pathogen that gets dispersed in hiking books after being exposed to an initial inoculum. The starting inoculum had a concentration of 10^5 zoospores in water. Boots step on the inoculated water source, and walk for s meters, where s=1,2,...,5. In this example we were trying to quantify the dispersal kernel for human mediated spore dispersal.

The main question here is over what range of distances can fungal spores be dispersed by humans on their boots? And what function would describe the relationship between concentration of those spores and distance walked.

From the initial meeting with Sarah, I understood that we start with 10^5 zoospores in a water vessel, and essentially estimate a pick up rate. This means that data points at distance zero (d=0) are a proportion of the initial starter. From here, we would like to estimate the proportion of zoospores left on the shoe vector, and thus the proportion of zoospores transferred to the soil as distance from the starter increases.

A common approach to this is to use an exponential model with:

f(d) = a \text{e}^{-bd}

where the proportion of zoospores left on the boot (f(d)) is a function of the initial number of zoospores at the starting point (in this case the pick up rate as a proportion of the total at distance d=0). We assume that zoospores are transferred from the boot to the soil at a constant rate b, as shown in the equation above.

For algebra reasons:

y = a \text{e}^{-bx} \\ log(y) = log(a \cdot \text{e}^{-bx}) \\ log(y) = log(a) + log(\text{e}^{-bx}) \\ log(y) = A - bx

To fit this model in R, you can just fit a linear model, where your response variable is log(y), with an intercept A, and a slope -b.

As an example, let’s generate some random data from the perspective of this experiment. The starting point:

library(tidyverse)

# From what I remember these seemed to be the average values
# for the number of zoospores
# recovered from the boot
y_mean <- c(160, 100, 60, 20, 5, 2)
y_sd <- c(0, 30, 10, 5, 5, 5)

# And these are the number of data points for each of the distances
d <- c(0,1,1,1,2,2,2,2,3,3,3,4,4,5,5,5)

set.seed(86923)
#simulate some data
y <- NULL
for(i in 1:length(d)){
  val_d <- d[i]
  gen_y <- rnorm(1, mean = y_mean[val_d+1], sd = y_sd[val_d+1])
  y[i] <- gen_y
}

plot(d, y)

We have some simulated data now, and now will fit an exponential function to it:

# Save my simulated data as an object
my_df <- data.frame(d = d, y = y)

# The model looks at the proportion of zoospores that remain on the boot
pick_up <- my_df$y[which(my_df$d==0)] # number of zoospores picked up from source

my_df |> 
  mutate(prop_y = y/pick_up) -> my_df

# plot(my_df$d, my_df$prop_y)

# Fit the function as a linear model:

m1 <- lm(log(prop_y) ~ d, data = my_df)
broom::tidy(m1)

# A tibble: 2 × 5
  term        estimate std.error statistic      p.value
  <chr>          <dbl>     <dbl>     <dbl>        <dbl>
1 (Intercept)    0.326    0.213       1.53 0.149       
2 d             -0.774    0.0690    -11.2  0.0000000221

We can look at the results from this model and see that the intercept (a) is giving us the starting value for d=0, for the proportion of zoospores remaining on the boot. Considering this is in a log scale, we transform back to get the actual proportion:

exp(m1$coefficients[1])

(Intercept) 
   1.385001

But we also have to remember that this is a proportion of zoospores remaining given the initial pickup rate. So to get an actual number of zoospores at a certain distance, in this case d=0, we have to multiply by the initial pick up rate and we get

exp(m1$coefficients[1]) * pick_up

(Intercept) 
   221.6001

as the number of zoospores picked up by the boot at the source. This model assumes a constant decay rate -0.7742497, which corresponds to b.

We can use these coefficients to predict the proportion of zoospores remaining on the boot at any given distance d. For example:

# predictions vector
d_pred <- seq(from = 2, to = 10, by = 0.5) # in meters

# Use the model to generate predictions for these new distance values
y_log_pred <- predict(m1, list(d = d_pred))
# transform back to get proportions
y_pred_prop <- exp(y_log_pred)

# transform to number of zoospores using the initial pick up rate
y_pred<- y_pred_prop * pick_up

# Get one data frame to visualize
pred_df <- data.frame(d = d_pred, y = y_pred, prop_y = y_pred_prop, type = "pred")

#| warning: false

my_df |> 
  mutate(type = "og") |> 
  bind_rows(pred_df) -> combined_df

combined_df |> 
  ggplot(aes(x = d, y = y, color = type)) +
   geom_point(alpha = 0.7) +
   theme_classic() +
  labs(y = "Zoospores on boot", x = "Distance in meters") -> nzoos_fig


combined_df |> 
  ggplot(aes(x = d, y = prop_y, color = type)) +
  geom_point(alpha = 0.7) +
  theme_classic() +
  labs(y = "Proportion of zoospores remaining on boot",
       x = "Distance in meters",
       color = "") -> prop_fig

my_legend <- cowplot::get_legend(prop_fig)

Warning in get_plot_component(plot, "guide-box"): Multiple components found;
returning the first one. To return all, use `return_all = TRUE`.

cowplot::plot_grid(
   nzoos_fig + theme(legend.position = "none"),
   NULL,
   prop_fig + theme(legend.position = "none"),
   my_legend,
   ncol = 4,
   rel_widths = c(1, 0.1, 1, 0.2)
)

We can also fit this using nonlinear least squares with the same starting dataset:

slice_sample(my_df, n = 6)

  d          y     prop_y
1 3  22.117988 0.13823743
2 3  24.029967 0.15018730
3 2  68.879803 0.43049877
4 5   7.803433 0.04877146
5 1  68.505725 0.42816078
6 0 160.000000 1.00000000

m2 <- nls(prop_y ~ I(a * exp(b * d)), data = my_df, start=list(a=1, b=0))

We can also do a prediction for the same predicted distance values as before:

plot(d_pred, predict(m2, list(d = d_pred)), col = "darkgreen", type = "l")

broom::tidy(m2)

# A tibble: 2 × 5
  term  estimate std.error statistic  p.value
  <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 a        1.05     0.0690      15.2 4.30e-10
2 b       -0.567    0.0514     -11.0 2.71e- 8

And calculate some goodness of fit values:

rss.p <- sum(residuals(m2)^2)  # Residual sum of squares
tss <- sum((my_df$prop_y - mean(my_df$prop_y))^2) # Total sum of squares
1 - (rss.p/tss)  # R-squared measure

[1] 0.9321556

This R-squared measure tells you how much of the variation in the data is explained by your independent variable in the regression model. In our case, it tells us 93.2% of the variation in the proportion of zoospores remaining on the boot is explained by the distance variable.

To wrap up, if we consider the model:

y = a\text{e}^{-bx} where y is the proportion of zoospores that remain on the boot and x is the distance in meters from the source of the pathogen, then a tells us the starting number of zoospores picked up by the boot at the source, and b tells us the rate at which zoospores are transferred from the boot to the soil.
If we use the second model’s values, then we can ask the question, what proportion of zoospores would remain on the hiker’s boot after walking for 20 meters?

a <- summary(m2)$coefficients[1]
b <- summary(m2)$coefficients[2]

get_prop_y <- function(x, a, b){
  a*exp(b*x)
}


get_prop_y(20, a, b)

[1] 1.243764e-05

this should be the same result using the predict() function:

predict(m2, list(d = 20))

[1] 1.243764e-05

Citation

BibTeX citation:

@online{rudolph2023,
  author = {Rudolph, Francisca Javiera},
  title = {Vector-Mediated Dispersal Kernels},
  date = {2023-02-13},
  url = {https://javirudolph.github.io/posts/2023-02-13-on-dispersal-kernels/},
  langid = {en}
}

For attribution, please cite this work as:

Rudolph, Francisca Javiera. 2023. “Vector-Mediated Dispersal Kernels.” February 13, 2023. https://javirudolph.github.io/posts/2023-02-13-on-dispersal-kernels/.