SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

Abstract:

The SimCorrMix package generates correlated continuous (normal, non-normal, and mix ture), binary, ordinal, and count (regular and zero-inflated, Poisson and Negative Binomial) variables that mimic real-world data sets. Continuous variables are simulated using either Fleishman’s third order or Headrick’s fifth-order power method transformation. Simulation occurs at the component level for continuous mixture distributions, and the target correlation matrix is specified in terms of correlations with components. However, the package contains functions to approximate expected correlations with continuous mixture variables. There are two simulation pathways which calculate intermediate correlations involving count variables differently, increasing accuracy under a wide range of parameters. The package also provides functions to calculate cumulants of continuous mixture distributions, check parameter inputs, calculate feasible correlation boundaries, and summarize and plot simulated variables. SimCorrMix is an important addition to existing R simulation packages because it is the first to include continuous mixture and zero-inflated count variables in correlated data sets.

Cite PDF Tweet

Authors

Affiliations

Allison Fialkowski

 

Hemant Tiwari

 

Published

Aug. 15, 2019

Received

Apr 4, 2018

DOI

10.32614/RJ-2019-022

Volume

Pages

11/1

250 - 286

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2019-022.zip

CRAN packages used

AdaptGauss, DPP, bgmm, ClusterR, mclust, mixture, AdMit, bimixt, bmixture, CAMAN, flexmix, mixdist, mixtools, nspmix, MixtureInf, Rmixmod, hurdlr, zic, mixpack, distr, stats, rebmix, SimCorrMix, SimMultiCorrData, GenOrd, VGAM, Matrix, ggplot2, mvtnorm

CRAN Task Views implied by cited packages

Cluster, Distributions, Multivariate, Bayesian, Environmetrics, Econometrics, Psychometrics, ExtremeValue, Finance, Graphics, MetaAnalysis, NumericalMathematics, Phylogenetics, Robust, SocialSciences, Survival, TeachingStatistics

Footnotes

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Fialkowski & Tiwari, "The R Journal: SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions", The R Journal, 2019

    BibTeX citation

    @article{RJ-2019-022,
      author = {Fialkowski, Allison and Tiwari, Hemant},
      title = {The R Journal: SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions},
      journal = {The R Journal},
      year = {2019},
      note = {https://doi.org/10.32614/RJ-2019-022},
      doi = {10.32614/RJ-2019-022},
      volume = {11},
      issue = {1},
      issn = {2073-4859},
      pages = {250-286}
    }