Gender Prediction Methods Based on First Names with genderizeR

Abstract:

In recent years, there has been increased interest in methods for gender prediction based on first names that employ various open data sources. These methods have applications from bibliometric studies to customizing commercial offers for web users. Analysis of gender disparities in science based on such methods are published in the most prestigious journals, although they could be improved by choosing the most suited prediction method with optimal parameters and performing validation studies using the best data source for a given purpose. There is also a need to monitor and report how well a given prediction method works in comparison to others. In this paper, the author recommends a set of tools (including one dedicated to gender prediction, the R package called genderizeR), data sources (including the genderize.io API), and metrics that could be fully reproduced and tested in order to choose the optimal approach suitable for different gender analyses.

Cite PDF Tweet

Author

Affiliation

Kamil Wais

 

Published

July 22, 2016

Received

Dec 17, 2015

DOI

10.32614/RJ-2016-002

Volume

Pages

8/1

17 - 37

CRAN packages used

genderizeR, qdap, gender, babynames, sortinghat, stringr, tm, ROCR, verification, data.table, dplyr

CRAN Task Views implied by cited packages

HighPerformanceComputing, NaturalLanguageProcessing, Finance, MachineLearning, Multivariate, WebTechnologies

Footnotes

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Wais, "The R Journal: Gender Prediction Methods Based on First Names with genderizeR", The R Journal, 2016

    BibTeX citation

    @article{RJ-2016-002,
      author = {Wais, Kamil},
      title = {The R Journal: Gender Prediction Methods Based on First Names with genderizeR},
      journal = {The R Journal},
      year = {2016},
      note = {https://doi.org/10.32614/RJ-2016-002},
      doi = {10.32614/RJ-2016-002},
      volume = {8},
      issue = {1},
      issn = {2073-4859},
      pages = {17-37}
    }