Biomedical Engineering Reference
In-Depth Information
15.4.2.5 Patient Data Linkage Data linkage does not consist of a simple
string comparison; the two main problems are related to looking through a
patient's information (homonyms, same address, equivalent birthdates) and
overall errors in names. Three levels of errors appear:
• Typographical errors (despite known spelling)
• Cognitive errors (comprehension problem)
• Phonetic errors (similar spelling)
The errors and variations are mainly related to the typing of handwritten data,
keyboard neighbors (k-i, e-r, etc), data input during a telephone conversation,
and software or database limitation of input fi elds (length limitation) that
force the use of abbreviations or initials. Several matching techniques aim to
measure similarity between strings. Two different approaches can be adopted:
• Pattern matching for fl exible matching between two strings
• A combination of phonetic encoding and exact matching
The similarity measurement is generally normalized: two strings are equivalent
with score
0 .
The effi ciency of the solution will impact the percentage of automatic
matching. This ratio must be as high as possible while guaranteeing a lower
level of false positive. For this linkage process the usage of a combination of
Jaro-Winkler [21] and Phonex [22] (French) algorithms are used. According
to the relevance and accuracy of information in the data set, different weights
are attributed.
For each fi eld, four different criteria defi ne how to interpret matching
scores according to fi eld types:
=
1 and if totally different score
=
• Accuracy, which defi nes the relevance of information
• Blocking, in case of false matching (under threshold), where the corre-
spondence would be automatically rejected
• Weight (similar), which represents a factor attributed in case of similarity
(over threshold)
• Weight (different), in case of false matching, a divide factor attributed to
global similarity
Weight distinction between similar and different matching is necessary. As in
the following example: The probability of having a last name different for only
one patient in distributed databases is small so it considerably reduces the
matching chance. However, having two entries with the same address does not
mean that the patient is identical for these two entries. Table 15.1 summarizes
the proposition of criteria adjustment for automatic record linkage. A weight
factor is attributed for each fi eld and is submitted as input for the linkage
algorithm.
Search WWH ::




Custom Search