Database Reference
In-Depth Information
3.8.3
Validating Record Matches
What remains is to determine how high a score indicates that two records truly represent
the same individual. In the example at hand, there was an easy way to make that decision,
and the technique can be applied in many similar situations. It was decided to look at the
creation-dates for the records at hand, and to assume that 90 days was an absolute max-
imum delay between the time the service was bought at Company A and registered at B.
Thus, a proposed match between two records that were chosen at random, subject only to
the constraint that the date on the B-record was between 0 and 90 days after the date on the
A-record, would have an average delay of 45 days.
It was found that of the pairs with a perfect 300 score, the average delay was 10 days. If
you assume that 300-score pairs are surely correct matches, then you can look at the pool
of pairs with any given score s , and compute the average delay of those pairs. Suppose that
the average delay is x , and the fraction of true matches among those pairs with score s is f .
Then x = 10 f + 45(1 − f ), or x = 45 − 35 f . Solving for f , we find that the fraction of the pairs
with score s that are truly matches is (45 − x )/35.
The same trick can be used whenever:
(1) There is a scoring system used to evaluate the likelihood that two records represent the
same entity, and
(2) There is some field, not used in the scoring, from which we can derive a measure that
differs, on average, for true pairs and false pairs.
For instance, suppose there were a “height” field recorded by both companies A and B in
our running example. We can compute the average difference in height for pairs of random
records, and we can compute the average difference in height for records that have a per-
fect score (and thus surely represent the same entities). For a given score s , we can evaluate
the average height difference of the pairs with that score and estimate the probability of the
records representing the same entity. That is, if h 0 is the average height difference for the
perfect matches, h 1 is the average height difference for random pairs, and h is the average
height difference for pairs of score s , then the fraction of good pairs with score s is ( h 1
h )/( h 1 h 0 ).
3.8.4
Matching Fingerprints
When fingerprints are matched by computer, the usual representation is not an image, but
a set of locations in which minutiae are located. A minutia, in the context of fingerprint
descriptions, is a place where something unusual happens, such as two ridges merging or a
Search WWH ::




Custom Search