There is a strong software engineering culture in the R developer community. We recommend creating, updating and vetting packages as well as keeping up with community standards. We invite contributions to the rOpenSci project, where participants can gain experience that will shape their work and that of their peers.
The R programming language was originally created for statisticians, by statisticians, but evolved over time to attract a “massive pool of talent that was previously untapped” (Hadley Wickham in Thieme (2018)). Despite the fact that most R users are academic researchers and business data analysts without a background in software engineering, we are witnessing a rapid rise in software engineering within the community. In this comment we spotlight recent progress in tooling, dissemination and support, including specific efforts led by the rOpenSci project. We hope that readers will take advantage of and participate in the tools and practices we describe.
The basic infrastructure for creating, building, installing, and checking packages has been in place since the early days of the R language. During this time (1998-2011), the barriers to entry were very high and access to support and Q&A for beginners were extremely limited. With the introduction of the devtools (Wickham et al. 2021b) package in 2011, the process of creating and updating packages became substantially easier. Documentation also became simpler to maintain. The roxygen2 (Wickham et al. 2021a) package allowed developers to keep documentation in sync with changes in code, similar to the doxygen approach that was embraced in more mature languages. Combined with the rise in popularity of StackOverflow and the growth of rstats blogs, the number of packages on the Comprehensive R Archive Network (CRAN) skyrocketed from 400 new packages in 2010 to 1000 new packages by 2014. As of this writing, there are nearly 19k packages on CRAN.
For novices without substantial software engineer experience, the early testing frameworks were also difficult to use. With the release of testthat (Wickham 2011), testing also became smoother. There are now several actively maintained testing frameworks such as tinytest (van der Loo 2020); as well as testthat-compatible specialized tooling for testing database interactions (dittodb (Keane and Vargas 2020)), web resources (vcr (Chamberlain 2021)), httptest (Richardson 2021), and webfakes (Csárdi 2021) which enables the use of an embedded C/C++ web server for testing HTTP clients like httr2 (Wickham 2021)).
The testthat package has recently been improved with snapshot tests that make it possible to test plot outputs. The rOpenSci project has released autotest (Padgham 2021), a package that supports automatic mutation testing.
Beyond checking for compliance with R CMD CHECK, several other packages such as goodpractice (Csárdi and Frick 2018), riskmetric (R Validation Hub et al. 2021), rOpenSci’s pkgcheck (Padgham and Salmon 2021) check packages against a large list of actionable, community recommended best practices for software development. Collectively these tools allow domain researchers to release software packages that meet high standards for software engineering.
The development and testing ecosystem of R is rich and has sometimes borrowed successful implementations from other languages (e.g. the vcr R package is a port, i.e. translation to R, of the vcr Ruby gem; testthat snapshot tests were inspired by JS Jest
As underlined in Thieme (2018), community is the strong suit of the R language. Many organizations and venues offer dedicated support for package developers.
Examples include Q&A on the r-package-devel mailing list
The rOpenSci organization (Boettiger et al. 2015) is an attractive venue for developers & supporters of scientific R software. One of our most successful and continuing initiatives is our Software Peer Review system (Ram et al. 2019), a combination of academic peer-review and code review from industry.
About 150 packages have been reviewed by volunteers to date, creating better packages as well as a growing knowledgebase in our development guide (rOpenSci et al. 2021) while also building a living community of practice.
Our model has been the fundamental inspiration for projects such as the Journal of Open Source Software (Smith et al. 2018), and PyOpenSci [Wasser and Holdgraf (2019)](Trizna et al. 2021).
We are continuously improving our system and reducing cognitive overload on editors and reviewers by automating repetitive tasks. Most recently we have expanded our offerings to peer review of packages that implement statistical methods (Statistical Software Peer Review) (Padgham et al. 2021).
Beside software review, rOpenSci community is a safe, welcoming and informative place for package developers, with Q&A happening on our public forum and semi-open Slack workspace. (Butland and LaZerte 2020)
The aforementioned tools, venues and organizations benefit from and support crucial dissemination efforts.
Publishing technical know-how is crucial for progress of the R community. R news has been circulating on Twitter
In summary, we observe that there is already a strong software engineering culture in the R developer community. By surfacing the rich suite of resources to new developers we can but only hope the future will bring success to all aforementioned initiatives. We recommend creating, updating and vetting packages with the tools we mentioned as well as keeping up with community standards with the venues we mentioned in the previous section. We invite contributions to the rOpenSci project, where participants can gain experience that will shape their work and that of their peers. Thanks to these efforts, we hope the R community will continue to be a thriving place of application for software engineering, by diverse practitioners from many different paths.
devtools, roxygen2, testthat, tinytest, dittodb, vcr, httptest, webfakes, httr2, autotest, goodpractice, riskmetric, pkgcheck
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Salmon & Ram, "The R Journal: The R Developer Community Does Have a Strong Software Engineering Culture", The R Journal, 2021
BibTeX citation
@article{RJ-2021-110, author = {Salmon, Maëlle and Ram, Karthik}, title = {The R Journal: The R Developer Community Does Have a Strong Software Engineering Culture}, journal = {The R Journal}, year = {2021}, note = {https://doi.org/10.32614/RJ-2021-110}, doi = {10.32614/RJ-2021-110}, volume = {13}, issue = {2}, issn = {2073-4859}, pages = {18-21} }