EVALUATING RECORD LINKAGE METHODS FOR MANIFOLD IDENTITY DETECTION

University essay from Umeå universitet/Institutionen för datavetenskap

Author: Olof Holmlund; [2019]

Keywords: ;

Abstract: Record Linkage is the process of linking two or more records in a database to the same real life entity. Th‘ese records do not share a common identifi€er. ‘This makes connecting them to each other a difficult task since they can only be linked based on similarities in their data. Th‘is data can also contain errors due to misspellings or missing €fields further increasing the difficulty of the task. In this thesis, common methods for comparing records and fi€nding duplicates are presented. Methods for increasing the performance and reducing the computer power needed are also presented to show how record-linkage can be used with big amounts of data. Built on this knowledge, several experiments comparing these methods have been conducted, using data from two benchmark data sets including Freely Extensible Biomedical Record Linkage (FEBRL) and the North Carolina Voter Registration (NCVR) data set. ‘The results presented show that di‚fferent types of similarity measures can have similar performance, and that supervised methods provide be‹tter prediction rates than unsupervised methods. Finally, suggestions for future work and improvements are given.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)