Wide-to-tall Data Reshaping Using Regular Expressions and the nc Package

Abstract:

Regular expressions are powerful tools for extracting tables from non-tabular text data. Capturing regular expressions that describe the information to extract from column names can be especially useful when reshaping a data table from wide (few rows with many regularly named columns) to tall (fewer columns with more rows). We present the R package nc (short for named capture), which provides functions for wide-to-tall data reshaping using regular expressions. We describe the main new ideas of nc, and provide detailed comparisons with related R packages (stats, utils, data.table, tidyr, tidyfast, tidyfst, reshape2, cdata).

Cite PDF Tweet

Published

June 6, 2021

Received

Apr 30, 2020

DOI

10.32614/RJ-2021-029

Volume

Pages

13/1

98 - 111

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2021-029.zip

Footnotes

    References

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Hocking, "The R Journal: Wide-to-tall Data Reshaping Using Regular Expressions and the nc Package", The R Journal, 2021

    BibTeX citation

    @article{RJ-2021-029,
      author = {Hocking, Toby Dylan},
      title = {The R Journal: Wide-to-tall Data Reshaping Using Regular Expressions and the nc Package},
      journal = {The R Journal},
      year = {2021},
      note = {https://doi.org/10.32614/RJ-2021-029},
      doi = {10.32614/RJ-2021-029},
      volume = {13},
      issue = {1},
      issn = {2073-4859},
      pages = {98-111}
    }