Database Reference
In-Depth Information
Name
Age Top-10 probability
Larry Stonebraker
35
0
.
8375
Catherine Spicer
46
0
.
775
Bruce Mourer
47
0
.
87875
Jason Haddad
51
0
.
85625
Angelina Amin
53
0
.
885
Jo Avery
53
0
.
7975
Nicola Stewart
54
0
.
8575
Tiffany Marshall
57
0
.
86
778
Table 7.2 The patients in the cancer registry whose top-10 probabilities are at least 0
Bridget Hiser
58
0
.
.
3.
1
0.6
Number of patients
0.8
0.5
0.4
0.6
0.3
0.4
0.2
0.2
0.1
number of patients
0
0
240
260
280
300
320
340
240
260
280
300
320
340
number of patients
number of patients
(a) Equi-width.
(b) Equi-depth.
Fig. 7.8 Answer to query: the number of patients appearing in both data sets.
7.7 Empirical Evaluation
In this section, we report a systematic empirical study. All experiments were con-
ducted on a PC computer with a 3.0 GHz Pentium 4 CPU, 1
0 GB main memory, and
a 160 GB hard disk, running the Microsoft Windows XP Professional Edition oper-
ating system, Our algorithms were implemented in Microsoft Visual Studio 2005.
.
7.7.1 Results on Real Data Sets
First, we apply the ranking queries and aggregate queries on the Cancer Registry
data set and the Social Security Death Index provided in Link Plus 2.0 1 .
The Cancer Registry data set contains 50
000 records and each record describes
the personal information of a patient, such as name and SSN . The Social Security
Death Index data set contains 10
,
000 records and each record contains the personal
information of an individual, such as name , SSN and Death Date . Since the in-
formation of some records are incomplete or ambiguous, we cannot find the exact
match for records in the two data sets.
,
1 http://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm
Search WWH ::




Custom Search