Database Reference
In-Depth Information
1
0.5
Average age
0.8
0.4
0.6
0.3
0.4
0.2
0.2
0.1
Average age
0
0
72
72.5
73
73.5
74
74.5
40
60
80
100
Age
Age
(a) Equi-width.
(b) Equi-depth.
Fig. 7.9 Answer to query: the average age of the patients appearing in both data sets.
0.8
1
Minimum age
0.7
0.8
0.6
0.5
0.6
0.4
0.4
0.3
0.2
0.2
0.1
Minimum age
0
0
20
25
30
35
40
45
50
25
30
35
40
45
50
Age
Age
(a) Equi-width.
(b) Equi-depth.
Fig. 7.10 Answer to query: the minimal age of the patients appearing in both data sets.
Link Plus is a popularly used tool that computes the probability that two records
referring to the same individual. It matches the records on the two data sets based
on name , SSN and Date of Birth and returns 4
658 pairs of records whose
linkage probabilities are greater than 0. The system suggests that a user should set
a matching linkage probability threshold. The pairs of records passing the threshold
are considered matching. If we set the threshold as the default value 0
,
.
25 suggested
by the system, only 99 pairs of records are returned.
First, we want to find the top-10 youngest patients in the cancer registry and re-
ported death. Therefore, we ask a probabilistic top- k query with k
3.
For each linked pair, we use the average ages in the Cancer Registry and the Social
Security Death Index. If we only consider the linked pairs whose probability pass
the matching threshold 0
=
10 and p
=
0
.
25, then the top-10 youngest patients with their edges are
shown in Table 7.1. However, we consider all linked pairs whose matching proba-
bilities are greater than 0 and find the patients whose top-10 probability is greater
than 0
.
3, we can the results as shown in Table 7.2.
Then, we ask the following count query on the data sets: what is the num-
ber of patients appearing in both data sets? The histogram answers are shown in
Figure 7.8 It is far from the 99 returned on the linked pairs passing the matching
threshold.
Moreover, an average query finds out the average age of the patients appear-
ing in both data sets . If only the 99 records whose matching probabilities are above
.
Search WWH ::




Custom Search