The dGAselID package proposes an original approach to feature selection in high dimen sional data. The method is built upon a diploid genetic algorithm. The genotype to phenotype mapping is modeled after the Incomplete Dominance Inheritance, overpassing the necessity to define a dominance scheme. The fitness evaluation is done by user selectable supervised classifiers, from a broad range of options. Cross validation options are also accessible. A new approach to crossover, inspired from the random assortment of chromosomes during meiosis is included. Several mutation operators, inspired from genetics, are also proposed. The package is fully compatible with the data formats used in Bioconductor and MLInterfaces package, readily applicable to microarray studies, but is flexible to other feature selection applications from high dimensional data. Several options for the visualization of evolution and outcomes are implemented to facilitate the interpretation of results. The package’s functionality is illustrated by examples.
dGAselID, genalg, GA, nsga2R, gaoptim, STPGA, kofnGA, mogavs, gaselect, scales
MLInterfaces, MLInterfaces, ALL, genefilter, hgu95av2.db
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Melita & Holban, "The R Journal: dGAselID: An R Package for Selecting a Variable Number of Features in High Dimensional Data", The R Journal, 2017
BibTeX citation
@article{RJ-2017-040, author = {Melita, Nicolae Teodor and Holban, Stefan}, title = {The R Journal: dGAselID: An R Package for Selecting a Variable Number of Features in High Dimensional Data}, journal = {The R Journal}, year = {2017}, note = {https://doi.org/10.32614/RJ-2017-040}, doi = {10.32614/RJ-2017-040}, volume = {9}, issue = {2}, issn = {2073-4859}, pages = {18-34} }