Compilation of resources for a reproducible workflow in R
start2finish is a project I created to share resources and provide a simple starting place for building a reproducible project. When I created this, I had my mentors and peers in mind, hoping that this guide would take away the fear from creating an R package for their own projects.
Have you ever experienced running old code and having it break? Or found the code associated to a scientific article, and when trying to understand it or run it, realize it is almost impossible to figure out?
As an ecologist, and as many of us, I started my journey with R and data analysis as a self-directed adventure, first learning to code, and later realizing about the importance of reproducibility. Particularly associated with R programming, there is an overwhelming number of resources for reproducible research. This poster is meant to be a resource, a short guide and starting point for setting up a reproducible workflow in R.
Most of the workflow relies on the usethis
package and you can find a short tutorial on building packages here.
Perhaps this depends on the type of work you do, I’m not an expert and don’t have particularly strong opinions. However, a package makes you follow certain conventions to keep things organized. I am a fan of writing functions in an R package and writing detailed documentation using the ‘roxygen2’ package. Building a package is a nice way to keep things together, organized, and clear. Although I am sure that you can also create a chaotic package as well.
Although you don’t need these two for setting up a project in R but maintaining version control is highly recommended and fundamental for reproducibility. This means that there is a history for your code and analysis. Connecting Git and GitHub to RStudio is system dependent, a good resource for this process can be found in happygitwithr.com
Before you run these steps, make sure you have installed the following packages: usethis
, roxygen2
, renv
, here
.
usethis
package can be found here. Running this function will open a new R session with your package!
usethis::create_package("your package path")
renv
package.
renv::init()
renv::snapshot()
here
package and avoid starting scripts with setwd("your/specific/path/that/does/not/work/on/another/computer)
. I will be honest, I had a hard time understanding this package, until I ran across Jenny Richmond’s post on how to use the here
package. It comes down to the difference in file paths between .R and .Rmd files.
here::here()
dplyr
or base R, to clean your data using R scripts. Any changes or deletions that happen in the spreadsheet are lost and forgotten in the realm of non-reproducible clicks. Clean your data with scripts so that you can always go back to the original and be certain of what changes have been made during the cleanup. Broman & Wu, 2018 has great advice on working with spreadsheets.rmarkdown
. There are several packages out there that use rmarkdown
and will help set up different types of articles. You can even create presentations with rmarkdown
. For simplicity, if using rmarkdown
and version control (Git and GitHub), you can avoid having several final.docx versions of your work.When I am starting a new project, I follow these steps:
usethis::create_package("projects/mypackage")
usethis::use_mit_license(name = "Your Name")
usethis::use_git()
usethis::use_github()
usethis::use_readme_rmd()
These steps will create my package, my GitHub repo and a README with rmarkdown
so that I can include chunks of code and figures with it. After that setup I will start tracking my packages:
renv::init()
renv::snapshot()
I will load some of the packages I know I will use in my work:
usethis::use_package("dplyr", "ggplot", "fitdistrplus")
And then save the changes with
renv::snapshot()
After the snapshot, you can commit your changes, and push them to your repo so that your lockfile (revn.lock
) is updated. Any time that new packages are loaded, you repeat these steps.
You can create your first script, add a function with descriptions, and use roxygen2
for that. You can find a short tutorial here
usethis::use_r("name of your script")
This setup is intended for you to take the leap, and get started. There are a number of resources out there, perhaps too many sometimes. If you’d like to jump over to “how do I write my manuscript in rmarkdown” you should definitely check out Anna Krystalli’s Reproduce a paper in Rmd and follow some of the resources bellow.