Optimization Routines for Enforcing One-to-One Matches in Record Linkage Problems

Abstract:

Record linkage aims at quickly and accurately identifying if two records represent the same real world entity. In many applications, we are interested in restricting the linkage results to “1 to 1” links, that is a single record does not appear more than once in the output. This can be dealt with the transport algorithm. The optimization problem, however, grows quadratically in the size of the input, quickly becoming untreatable for cases with a few thousand records. This paper compares different solutions, provided by some R packages for linear programming solvers. The comparison is done in terms of memory usage and execution time. The aim is to overcome the current implementation in the toolkit RELAIS, specifically developed for record linkage problems. The results highlight improvements beyond expectations. In fact the tested solutions allow successfully executing the “1 to 1” reduction for large size datasets up to the largest sample surveys at National Statistical Institutes.

Cite PDF Tweet

Published

July 29, 2019

Received

Oct 26, 2018

DOI

10.32614/RJ-2019-008

Volume

Pages

11/1

185 - 197

Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2019-008.zip

CRAN packages used

lpSolve, Rglpk, ROI.plugin.clp, intpoint, slam

CRAN Task Views implied by cited packages

Optimization

Footnotes

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Moretti, et al., "The R Journal: Optimization Routines for Enforcing One-to-One Matches in Record Linkage Problems", The R Journal, 2019

    BibTeX citation

    @article{RJ-2019-008,
      author = {Moretti, Diego and Valentino, Luca and Tuoto, Tiziana},
      title = {The R Journal: Optimization Routines for Enforcing One-to-One Matches in Record Linkage Problems},
      journal = {The R Journal},
      year = {2019},
      note = {https://doi.org/10.32614/RJ-2019-008},
      doi = {10.32614/RJ-2019-008},
      volume = {11},
      issue = {1},
      issn = {2073-4859},
      pages = {185-197}
    }