The R Developer Community Does Have a Strong Software Engineering Culture

Abstract:

There is a strong software engineering culture in the R developer community. We recommend creating, updating and vetting packages as well as keeping up with community standards. We invite contributions to the rOpenSci project, where participants can gain experience that will shape their work and that of their peers.

Cite PDF Tweet

Published

Dec. 13, 2021

Received

Oct 24, 2021

DOI

10.32614/RJ-2021-110

Volume

Pages

13/2

18 - 21

Introduction

The R programming language was originally created for statisticians, by statisticians, but evolved over time to attract a “massive pool of talent that was previously untapped” (Hadley Wickham in ). Despite the fact that most R users are academic researchers and business data analysts without a background in software engineering, we are witnessing a rapid rise in software engineering within the community. In this comment we spotlight recent progress in tooling, dissemination and support, including specific efforts led by the rOpenSci project. We hope that readers will take advantage of and participate in the tools and practices we describe.

The modern R package developer toolbox: user-friendlier, more comprehensive

The basic infrastructure for creating, building, installing, and checking packages has been in place since the early days of the R language. During this time (1998-2011), the barriers to entry were very high and access to support and Q&A for beginners were extremely limited. With the introduction of the devtools package in 2011, the process of creating and updating packages became substantially easier. Documentation also became simpler to maintain. The roxygen2 package allowed developers to keep documentation in sync with changes in code, similar to the doxygen approach that was embraced in more mature languages. Combined with the rise in popularity of StackOverflow and the growth of rstats blogs, the number of packages on the Comprehensive R Archive Network (CRAN) skyrocketed from 400 new packages in 2010 to 1000 new packages by 2014. As of this writing, there are nearly 19k packages on CRAN.

For novices without substantial software engineer experience, the early testing frameworks were also difficult to use. With the release of testthat , testing also became smoother. There are now several actively maintained testing frameworks such as tinytest ; as well as testthat-compatible specialized tooling for testing database interactions (dittodb ), web resources (vcr ), httptest , and webfakes which enables the use of an embedded C/C++ web server for testing HTTP clients like httr2 ).

The testthat package has recently been improved with snapshot tests that make it possible to test plot outputs. The rOpenSci project has released autotest , a package that supports automatic mutation testing.

Beyond checking for compliance with R CMD CHECK, several other packages such as goodpractice , riskmetric , rOpenSci’s pkgcheck check packages against a large list of actionable, community recommended best practices for software development. Collectively these tools allow domain researchers to release software packages that meet high standards for software engineering.

The development and testing ecosystem of R is rich and has sometimes borrowed successful implementations from other languages (e.g. the vcr R package is a port, i.e. translation to R, of the vcr Ruby gem; testthat snapshot tests were inspired by JS Jesthttps://www.tidyverse.org/blog/2020/10/testthat-3-0-0/#snapshot-testing).

Emergence of a welcoming community

As underlined in , community is the strong suit of the R language. Many organizations and venues offer dedicated support for package developers. Examples include Q&A on the r-package-devel mailing listhttps://stat.ethz.ch/mailman/listinfo/r-package-devel, and the package development category of the RStudio community forumhttps://community.rstudio.com/c/package-development/11, and the rstats section of StackOverflowhttps://stackoverflow.com/questions/tagged/r?tab=Newest. Traditionally, R package developers have been mostly male and white. Although the status quo remains similar, efforts from groups such as R-Ladieshttp://rladies.org/ meetups, Minorities in R , and the package development modules offered by Forwards for underrepresented groupshttps://buzzrbeeline.blog/2021/02/09/r-forwards-package-development-modules-for-women-and-other-underrepresented-groups/ have made considerable inroads towards improving diversity. These efforts have worked hard to put the spotlight on developers beyond the “usual suspects”.

rOpenSci community and software review

The rOpenSci organization is an attractive venue for developers & supporters of scientific R software. One of our most successful and continuing initiatives is our Software Peer Review system , a combination of academic peer-review and code review from industry. About 150 packages have been reviewed by volunteers to date, creating better packages as well as a growing knowledgebase in our development guide while also building a living community of practice.
Our model has been the fundamental inspiration for projects such as the Journal of Open Source Software , and PyOpenSci []. We are continuously improving our system and reducing cognitive overload on editors and reviewers by automating repetitive tasks. Most recently we have expanded our offerings to peer review of packages that implement statistical methods (Statistical Software Peer Review) .
Beside software review, rOpenSci community is a safe, welcoming and informative place for package developers, with Q&A happening on our public forum and semi-open Slack workspace.

Creation and dissemination of resources for R programmers

The aforementioned tools, venues and organizations benefit from and support crucial dissemination efforts.
Publishing technical know-how is crucial for progress of the R community. R news has been circulating on Twitterhttps://www.t4rstats.com/, R Weeklyhttps://rweekly.org/ and R-Bloggershttps://www.r-bloggers.com/. Some sources have been more specifically aimed at R package developers of various experience and interests. While “Writing R Extensions” https://cran.r-project.org/doc/manuals/R-exts.html is the official & exhaustive reference on writing R packages, it is a reference rather than a learning resource: many R package developers, if not learning by example, get introduced to R package development via introductory blog posts or tutorials, and the R packages book by Hadley Wickham and Jenny Bryan [] that accompany the devtools suite of packages is freely available online and strives to improving the R package development experience. The rOpenSci guide “rOpenSci Packages: Development, Maintenance, and Peer Review” contains our community-contributed guidance on how to develop packages and review them. It features opinionated requirements such as the use of roxygen2 for package documentation; criteria helping make an informed decision on gray area topics such as limiting dependencies; advice on widely accepted and emerging tools. As it is a living document also used as reference for editorial decisions, we maintain a changeloghttps://devguide.ropensci.org/booknews.html, and summarize each release in a blog posthttps://ropensci.org/tags/dev-guide/. rOpenSci also hosts a book on a specialized topic, HTTP testing in Rhttps://books.ropensci.org/http-testing/, that presents both principles for testing packages that interact with web resources, as well as relevant packages. Beside these examples of long-form documentation, knowledge around R software engineering is shared through blogs and talks. In the R blogging world, the rOpenSci blog postshttps://ropensci.org/blog/, technical noteshttps://ropensci.org/technotes/ and a section of our monthly newsletterhttps://ropensci.org/news/ feature some topics relevant to package developers, as do some of the posts on the Tidyverse bloghttps://www.tidyverse.org/categories/programming/. The blog of the R-hub projecthttps://blog.r-hub.io/post/ contains information on package development topics, in particular about common problems such as sharing data via R packages or understanding CRAN checks. Expert programmers have been sharing their R specific wisdom as well as software engineering lessons learned from other languages (e.g. Jenny Bryan’s useR! Keynote address “code feels, code smells”https://github.com/jennybc/code-smells-and-feels).

Conclusion

In summary, we observe that there is already a strong software engineering culture in the R developer community. By surfacing the rich suite of resources to new developers we can but only hope the future will bring success to all aforementioned initiatives. We recommend creating, updating and vetting packages with the tools we mentioned as well as keeping up with community standards with the venues we mentioned in the previous section. We invite contributions to the rOpenSci project, where participants can gain experience that will shape their work and that of their peers. Thanks to these efforts, we hope the R community will continue to be a thriving place of application for software engineering, by diverse practitioners from many different paths.

CRAN packages used

devtools, roxygen2, testthat, tinytest, dittodb, vcr, httptest, webfakes, httr2, autotest, goodpractice, riskmetric, pkgcheck

CRAN Task Views implied by cited packages

WebTechnologies, Databases

Footnotes

  1. https://www.tidyverse.org/blog/2020/10/testthat-3-0-0/#snapshot-testing[↩]
  2. https://stat.ethz.ch/mailman/listinfo/r-package-devel[↩]
  3. https://community.rstudio.com/c/package-development/11[↩]
  4. https://stackoverflow.com/questions/tagged/r?tab=Newest[↩]
  5. http://rladies.org/[↩]
  6. https://buzzrbeeline.blog/2021/02/09/r-forwards-package-development-modules-for-women-and-other-underrepresented-groups/[↩]
  7. https://www.t4rstats.com/[↩]
  8. https://rweekly.org/[↩]
  9. https://www.r-bloggers.com/[↩]
  10. https://cran.r-project.org/doc/manuals/R-exts.html[↩]
  11. https://devguide.ropensci.org/booknews.html[↩]
  12. https://ropensci.org/tags/dev-guide/[↩]
  13. https://books.ropensci.org/http-testing/[↩]
  14. https://ropensci.org/blog/[↩]
  15. https://ropensci.org/technotes/[↩]
  16. https://ropensci.org/news/[↩]
  17. https://www.tidyverse.org/categories/programming/[↩]
  18. https://blog.r-hub.io/post/[↩]
  19. https://github.com/jennybc/code-smells-and-feels[↩]

References

C. Boettiger, S. Chamberlain, E. Hart and K. Ram. Building software, building community: Lessons from the rOpenSci project. Journal of Open Research Software, 3(1): e8, 2015. DOI 10.5334/jors.bu.
S. Butland and S. LaZerte. rOpenSci community contributing guide. Zenodo, 2020. URL https://contributing.ropensci.org/.
S. Chamberlain. Vcr: Record ’HTTP’ calls to disk. 2021. URL https://CRAN.R-project.org/package=vcr. R package version 1.0.2.
G. Csárdi. Webfakes: Fake web apps for HTTP testing. 2021. https://webfakes.r-lib.org/, https://github.com/r-lib/webfakes.
G. Csárdi and H. Frick. Goodpractice: Advice on r package building. 2018. URL https://CRAN.R-project.org/package=goodpractice. R package version 1.0.2.
J. Keane and M. Vargas. Dittodb: A test environment for database requests. 2020. URL https://CRAN.R-project.org/package=dittodb. R package version 0.1.3.
M. Padgham. Autotest: Automatic package testing. 2021. https://docs.ropensci.org/autotest/, https://github.com/ropensci-review-tools/autotest.
M. Padgham and M. Salmon. Pkgcheck: rOpenSci package checks. 2021. https://docs.ropensci.org/pkgcheck/, https://github.com/ropensci-review-tools/pkgcheck.
M. Padgham, M. Salmon, N. Ross, J. Nowosad, R. FitzJohn, yilong zhang, C. Sax, F. Rodriguez-Sanchez, F. Briatte and L. Collado-Torres. ropensci/statistical-software-review-book: Official first standards versions. Zenodo, 2021. URL https://doi.org/10.5281/zenodo.5556756.
R Validation Hub, D. Kelkhoff, M. Gotti, E. Miller, K. K, Y. Zhang, E. Milliman and J. Manitz. Riskmetric: Risk metrics to evaluating r packages. 2021. https://pharmar.github.io/riskmetric/, https://github.com/pharmaR/riskmetric.
K. Ram, C. Boettiger, S. Chamberlain, N. Ross, M. Salmon and S. Butland. A community of practice around peer review for long-term research software sustainability. Computing in Science Engineering, 21(2): 59–65, 2019. DOI 10.1109/MCSE.2018.2882753.
N. Richardson. Httptest: A test environment for HTTP requests. 2021. https://enpiar.com/r/httptest/, https://github.com/nealrichardson/httptest.
rOpenSci, B. Anderson, S. Chamberlain, L. DeCicco, J. Gustavsen, A. Krystalli, M. Lepore, L. Mullen, K. Ram, N. Ross, et al. rOpenSci Packages: Development, Maintenance, and Peer Review. Zenodo, 2021. URL https://doi.org/10.5281/zenodo.4554776.
D. Scott and D. Smalls-Perkins. Introducing MiR: A community for underrepresented minority users of r. Medium, 2020. URL https://medium.com/@doritolay/introducing-mir-a-community-for-underrepresented-users-of-r-7560def7d861.
A. M. Smith, K. E. Niemeyer, D. S. Katz, L. A. Barba, G. Githinji, M. Gymrek, K. D. Huff, C. R. Madan, A. C. Mayes, K. M. Moerman, et al. Journal of open source software (JOSS): Design and first-year review. PeerJ Computer Science, 4: e147, 2018. URL https://doi.org/10.7717/peerj-cs.147.
N. Thieme. R generation. Significance, 15(4): 14–19, 2018. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1740-9713.2018.01169.x.
M. Trizna, L. A. Wasser and D. Nicholson. pyOpenSci: Open and reproducible research, powered by python. Biodiversity Information Science and Standards, 5: e75688, 2021. URL https://doi.org/10.3897/biss.5.75688.
M. van der Loo. A method for deriving information from running r code. The R Journal, Accepted for publication, 2020. URL https://arxiv.org/abs/2002.07472.
L. A. Wasser and C. Holdgraf. pyOpenSci Promoting Open Source Python Software To Support Open Reproducible Science. In AGU fall meeting abstracts, pages. NS21A–13 2019.
H. Wickham. httr2: Perform HTTP requests and process the responses. 2021. URL https://CRAN.R-project.org/package=httr2. R package version 0.1.1.
H. Wickham. R packages. O’Reilly Media, 2015.
H. Wickham. Testthat: Get started with testing. The R Journal, 3: 5–10, 2011. URL https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
H. Wickham and J. Bryan. R packages.URL https://r-pkgs.org/.
H. Wickham, P. Danenberg, G. Csárdi and M. Eugster. roxygen2: In-line documentation for r. 2021a. URL https://CRAN.R-project.org/package=roxygen2. R package version 7.1.2.
H. Wickham, J. Hester and W. Chang. Devtools: Tools to make developing r packages easier. 2021b. URL https://CRAN.R-project.org/package=devtools. R package version 2.4.2.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Salmon & Ram, "The R Journal: The R Developer Community Does Have a Strong Software Engineering Culture", The R Journal, 2021

BibTeX citation

@article{RJ-2021-110,
  author = {Salmon, Maëlle and Ram, Karthik},
  title = {The R Journal: The R Developer Community Does Have a Strong Software Engineering Culture},
  journal = {The R Journal},
  year = {2021},
  note = {https://doi.org/10.32614/RJ-2021-110},
  doi = {10.32614/RJ-2021-110},
  volume = {13},
  issue = {2},
  issn = {2073-4859},
  pages = {18-21}
}