A System for an Accountable Data Analysis Process in R

Abstract:

Efficiently producing transparent analyses may be difficult for beginners or tedious for the experienced. This implies a need for computing systems and environments that can efficiently satisfy reproducibility and accountability standards. To this end, we have developed a system, R package, and R Shiny application called adapr (Accountable Data Analysis Process in R) that is built on the principle of accountable units. An accountable unit is a data file (statistic, table or graphic) that can be associated with a provenance, meaning how it was created, when it was created and who created it, and this is similar to the ’verifiable computational results’ (VCR) concept proposed by Gavish and Donoho. Both accountable units and VCRs are version controlled, sharable, and can be incorporated into a collaborative project. However, accountable units use file hashes and do not involve watermarking or public repositories like VCRs. Reproducing collaborative work may be highly complex, requiring repeating computations on multiple systems from multiple authors; however, determining the provenance of each unit is simpler, requiring only a search using file hashes and version control systems.

Cite PDF Tweet

Published

May 14, 2018

Received

Sep 30, 2016

DOI

10.32614/RJ-2018-001

Volume

Pages

10/1

6 - 21

CRAN packages used

knitr, rmarkdown, cacher, archivist, adapr, packrat

CRAN Task Views implied by cited packages

ReproducibleResearch

Footnotes

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Gelfond, et al., "The R Journal: A System for an Accountable Data Analysis Process in R", The R Journal, 2018

    BibTeX citation

    @article{RJ-2018-001,
      author = {Gelfond, Jonathan and Goros, Martin and Hernandez, Brian and Bokov, Alex},
      title = {The R Journal: A System for an Accountable Data Analysis Process in R},
      journal = {The R Journal},
      year = {2018},
      note = {https://doi.org/10.32614/RJ-2018-001},
      doi = {10.32614/RJ-2018-001},
      volume = {10},
      issue = {1},
      issn = {2073-4859},
      pages = {6-21}
    }