VSURF: An R Package for Variable Selection Using Random Forests

Abstract:

This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but strategy can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.

Cite PDF Tweet

Published

Nov. 7, 2015

Received

Jul 28, 2014

DOI

10.32614/RJ-2015-018

Volume

Pages

7/2

19 - 33

CRAN packages used

VSURF, rpart, randomForest, party, ipred, Boruta, varSelRF, spikeSlabGAM, BioMark, mlbench, mixOmics

CRAN Task Views implied by cited packages

MachineLearning, Environmetrics, Survival, ChemPhys, Multivariate, Bayesian, HighPerformanceComputing

Footnotes

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Genuer, et al., "The R Journal: VSURF: An R Package for Variable Selection Using Random Forests", The R Journal, 2015

    BibTeX citation

    @article{RJ-2015-018,
      author = {Genuer, Robin and Poggi, Jean-Michel and Tuleau-Malot, Christine},
      title = {The R Journal: VSURF: An R Package for Variable Selection Using Random Forests},
      journal = {The R Journal},
      year = {2015},
      note = {https://doi.org/10.32614/RJ-2015-018},
      doi = {10.32614/RJ-2015-018},
      volume = {7},
      issue = {2},
      issn = {2073-4859},
      pages = {19-33}
    }