The Landscape of R Packages for Automated Exploratory Data Analysis

Abstract:

The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of fifteen popular R packages to identify the parts of analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development.

Cite PDF Tweet

Authors

Affiliations

Mateusz Staniak

 

Przemysław Biecek

 

Published

Aug. 16, 2019

Received

Mar 27, 2019

DOI

10.32614/RJ-2019-033

Volume

Pages

11/2

347 - 369

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2019-033.zip

CRAN packages used

cranlogs, radiant, visdat, archivist, xtable, arsenal, DataExplorer, dataMaid, dlookr, ExPanDaR, explore, shiny, exploreR, funModeling, inspectdf, RtutoR, SmartEDA, data.table, summarytools, knitr, ggplot2, xray, tableone, describer, skimr, prettyR, Hmisc, ggfortify, autoplotly, gpairs, GGally, survminer, cr17, DALEX, iml

CRAN Task Views implied by cited packages

ReproducibleResearch, TeachingStatistics, MissingData, WebTechnologies, Bayesian, ClinicalTrials, Econometrics, Finance, Graphics, HighPerformanceComputing, Multivariate, OfficialStatistics, Phylogenetics, SocialSciences, Survival, TimeSeries

Footnotes

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Staniak & Biecek, "The R Journal: The Landscape of R Packages for Automated Exploratory Data Analysis", The R Journal, 2019

    BibTeX citation

    @article{RJ-2019-033,
      author = {Staniak, Mateusz and Biecek, Przemysław},
      title = {The R Journal: The Landscape of R Packages for Automated Exploratory Data Analysis},
      journal = {The R Journal},
      year = {2019},
      note = {https://doi.org/10.32614/RJ-2019-033},
      doi = {10.32614/RJ-2019-033},
      volume = {11},
      issue = {2},
      issn = {2073-4859},
      pages = {347-369}
    }