library(tidyverse)
I wrote this post to help a friend a couple of years ago (og post), and I am finally getting to organize this into my brand new website! 🤩
At the time, Sarah was working on her dissertation with salamanders and the fungal pathogen Batrachochytrium salamandrivorans (Bsal). Specifically, we talked about generating dispersal kernels for the pathogen, thinking of humans walking on trails and carrying the pathogen on their boots. Sarah did some really fun experiments getting boots in the mud and taking steps, to quantify the amount of pathogen that would get onto the boots, and then dispersed by a person walking.
When you have available data, the usual goal is to fit an equation (or model) that describes the probability of dispersal as a function of distance. In this case, we are thinking of human mediated dispersal (HMD) of a specific fungal pathogen. The experiments were used to calculate the amount of pathogen that gets dispersed in hiking books after being exposed to an initial inoculum. The starting inoculum had a concentration of 10^5 zoospores in water. Boots step on the inoculated water source, and walk for s meters, where s=1,2,...,5. In this example we were trying to quantify the dispersal kernel for human mediated spore dispersal.
The main question here is over what range of distances can fungal spores be dispersed by humans on their boots? And what function would describe the relationship between concentration of those spores and distance walked.
From the initial meeting with Sarah, I understood that we start with 10^5 zoospores in a water vessel, and essentially estimate a pick up rate. This means that data points at distance zero (d=0) are a proportion of the initial starter. From here, we would like to estimate the proportion of zoospores left on the shoe vector, and thus the proportion of zoospores transferred to the soil as distance from the starter increases.
A common approach to this is to use an exponential model with:
f(d) = a \text{e}^{-bd}
where the proportion of zoospores left on the boot (f(d)) is a function of the initial number of zoospores at the starting point (in this case the pick up rate as a proportion of the total at distance d=0). We assume that zoospores are transferred from the boot to the soil at a constant rate b, as shown in the equation above.
For algebra reasons:
y = a \text{e}^{-bx} \\ log(y) = log(a \cdot \text{e}^{-bx}) \\ log(y) = log(a) + log(\text{e}^{-bx}) \\ log(y) = A - bx
To fit this model in R, you can just fit a linear model, where your response variable is log(y), with an intercept A, and a slope -b.
As an example, let’s generate some random data from the perspective of this experiment. The starting point:
# From what I remember these seemed to be the average values
# for the number of zoospores
# recovered from the boot
<- c(160, 100, 60, 20, 5, 2)
y_mean <- c(0, 30, 10, 5, 5, 5)
y_sd
# And these are the number of data points for each of the distances
<- c(0,1,1,1,2,2,2,2,3,3,3,4,4,5,5,5)
d
set.seed(86923)
#simulate some data
<- NULL
y for(i in 1:length(d)){
<- d[i]
val_d <- rnorm(1, mean = y_mean[val_d+1], sd = y_sd[val_d+1])
gen_y <- gen_y
y[i]
}
plot(d, y)
We have some simulated data now, and now will fit an exponential function to it:
# Save my simulated data as an object
<- data.frame(d = d, y = y)
my_df
# The model looks at the proportion of zoospores that remain on the boot
<- my_df$y[which(my_df$d==0)] # number of zoospores picked up from source
pick_up
|>
my_df mutate(prop_y = y/pick_up) -> my_df
# plot(my_df$d, my_df$prop_y)
# Fit the function as a linear model:
<- lm(log(prop_y) ~ d, data = my_df)
m1 ::tidy(m1) broom
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.326 0.213 1.53 0.149
2 d -0.774 0.0690 -11.2 0.0000000221
We can look at the results from this model and see that the intercept (a) is giving us the starting value for d=0, for the proportion of zoospores remaining on the boot. Considering this is in a log scale, we transform back to get the actual proportion:
exp(m1$coefficients[1])
(Intercept)
1.385001
But we also have to remember that this is a proportion of zoospores remaining given the initial pickup rate. So to get an actual number of zoospores at a certain distance, in this case d=0, we have to multiply by the initial pick up rate and we get
exp(m1$coefficients[1]) * pick_up
(Intercept)
221.6001
as the number of zoospores picked up by the boot at the source. This model assumes a constant decay rate -0.7742497, which corresponds to b.
We can use these coefficients to predict the proportion of zoospores remaining on the boot at any given distance d. For example:
# predictions vector
<- seq(from = 2, to = 10, by = 0.5) # in meters
d_pred
# Use the model to generate predictions for these new distance values
<- predict(m1, list(d = d_pred))
y_log_pred # transform back to get proportions
<- exp(y_log_pred)
y_pred_prop
# transform to number of zoospores using the initial pick up rate
<- y_pred_prop * pick_up
y_pred
# Get one data frame to visualize
<- data.frame(d = d_pred, y = y_pred, prop_y = y_pred_prop, type = "pred") pred_df
#| warning: false
|>
my_df mutate(type = "og") |>
bind_rows(pred_df) -> combined_df
|>
combined_df ggplot(aes(x = d, y = y, color = type)) +
geom_point(alpha = 0.7) +
theme_classic() +
labs(y = "Zoospores on boot", x = "Distance in meters") -> nzoos_fig
|>
combined_df ggplot(aes(x = d, y = prop_y, color = type)) +
geom_point(alpha = 0.7) +
theme_classic() +
labs(y = "Proportion of zoospores remaining on boot",
x = "Distance in meters",
color = "") -> prop_fig
<- cowplot::get_legend(prop_fig) my_legend
Warning in get_plot_component(plot, "guide-box"): Multiple components found;
returning the first one. To return all, use `return_all = TRUE`.
::plot_grid(
cowplot+ theme(legend.position = "none"),
nzoos_fig NULL,
+ theme(legend.position = "none"),
prop_fig
my_legend,ncol = 4,
rel_widths = c(1, 0.1, 1, 0.2)
)
We can also fit this using nonlinear least squares with the same starting dataset:
slice_sample(my_df, n = 6)
d y prop_y
1 3 22.117988 0.13823743
2 3 24.029967 0.15018730
3 2 68.879803 0.43049877
4 5 7.803433 0.04877146
5 1 68.505725 0.42816078
6 0 160.000000 1.00000000
<- nls(prop_y ~ I(a * exp(b * d)), data = my_df, start=list(a=1, b=0)) m2
We can also do a prediction for the same predicted distance values as before:
plot(d_pred, predict(m2, list(d = d_pred)), col = "darkgreen", type = "l")
::tidy(m2) broom
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 1.05 0.0690 15.2 4.30e-10
2 b -0.567 0.0514 -11.0 2.71e- 8
And calculate some goodness of fit values:
<- sum(residuals(m2)^2) # Residual sum of squares
rss.p <- sum((my_df$prop_y - mean(my_df$prop_y))^2) # Total sum of squares
tss 1 - (rss.p/tss) # R-squared measure
[1] 0.9321556
This R-squared measure tells you how much of the variation in the data is explained by your independent variable in the regression model. In our case, it tells us 93.2% of the variation in the proportion of zoospores remaining on the boot is explained by the distance variable.
To wrap up, if we consider the model:
y = a\text{e}^{-bx}
where y is the proportion of zoospores that remain on the boot and x is the distance in meters from the source of the pathogen, then a tells us the starting number of zoospores picked up by the boot at the source, and b tells us the rate at which zoospores are transferred from the boot to the soil.
If we use the second model’s values, then we can ask the question, what proportion of zoospores would remain on the hiker’s boot after walking for 20 meters?
<- summary(m2)$coefficients[1]
a <- summary(m2)$coefficients[2]
b
<- function(x, a, b){
get_prop_y *exp(b*x)
a
}
get_prop_y(20, a, b)
[1] 1.243764e-05
this should be the same result using the predict()
function:
predict(m2, list(d = 20))
[1] 1.243764e-05
Citation
@online{rudolph2023,
author = {Rudolph, Francisca Javiera},
title = {Vector-Mediated Dispersal Kernels},
date = {2023-02-13},
url = {https://javirudolph.github.io/posts/2023-02-13-on-dispersal-kernels/},
langid = {en}
}