Collections in R: Review and Proposal

Abstract:

R is a powerful tool for data processing, visualization, and modeling. However, R is slower than other languages used for similar purposes, such as Python. One reason for this is that R lacks base support for collections, abstract data types that store, manipulate, and return data (e.g., sets, maps, stacks). An exciting recent trend in the R extension ecosystem is the development of collection packages, packages that provide classes that implement common collections. At least 12 collection packages are available across the two major R extension repositories, the Comprehensive R Archive Network (CRAN) and Bioconductor. In this article, we compare collection packages in terms of their features, design philosophy, ease of use, and performance on benchmark tests. We demonstrate that, when used well, the data structures provided by collection packages are in many cases significantly faster than the data structures provided by base R. We also highlight current deficiencies among R collection packages and propose avenues of possible improvement. This article provides useful recommendations to R programmers seeking to speed up their programs and aims to inform the development of future collection-oriented software for R.

Cite PDF Tweet

Author

Affiliation

Timothy Barry

 

Published

June 12, 2018

Received

Nov 5, 2017

DOI

10.32614/RJ-2018-037

Volume

Pages

10/1

455 - 471

CRAN packages used

Rcpp, hashr, hashFunction, filehashSQLite, tictoc, DSL, bit64, bit, Oarray, sets, filehash, hash, hashmap, rstackdeque, rstack, liqueueR, dequer, flifo, listenv, stdvectors, microbenchmark, neuroim, FindMinIC

CRAN Task Views implied by cited packages

HighPerformanceComputing, MedicalImaging, NumericalMathematics

Bioconductor packages used

S4Vectors

Footnotes

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Barry, "The R Journal: Collections in R: Review and Proposal", The R Journal, 2018

    BibTeX citation

    @article{RJ-2018-037,
      author = {Barry, Timothy},
      title = {The R Journal: Collections in R: Review and Proposal},
      journal = {The R Journal},
      year = {2018},
      note = {https://doi.org/10.32614/RJ-2018-037},
      doi = {10.32614/RJ-2018-037},
      volume = {10},
      issue = {1},
      issn = {2073-4859},
      pages = {455-471}
    }