Database Reference
In-Depth Information
2
5
10
27 . There-
Pr 2
W 3 (
1 ,Pr 2
W 3 (
27 ,Pr 2
W 3 (
9 and Pr 2
W 3 (
(
A
)) =
(
B
)) =
(
C
)) =
(
D
)) =
fore, the probabilistic threshold top-k query returns
{
A
,
C
}
at time instant t.
At time instant t
+
1 , the top-k probabilities of the uncertain objects are:
Pr 2
W t + 1
3
1 ,Pr 2
W t + 1
3
27 ,Pr 2
2
W t + 1
3
4
9 and Pr 2
W t + 1
3
(
(
A
)) =
(
(
B
)) =
(
(
C
)) =
(
(
D
)) =
13
1 .
The methods of answering probabilistic threshold top-k queries will be discussed
in Chapter 6.
27 . The probabilistic threshold top-k query returns
{
A
}
at time instant t
+
2.3.2 Probabilistic Linkage Model
In the basic uncertain object model, we assume that each instance belongs to a
unique object, though an object may have multiple instances. It is interesting to
ask what if an instance may belong to different objects in different possible worlds.
Such a model is useful in probabilistic linkage analysis, as shown in the following
example.
Example 2.12 (Probabilistic linkages). Survival-after-hospitalization is an impor-
tant measure used in public medical service analysis. For example, to obtain the
statistics about the death population after hospitalization, Svartbo et al. [36] study
survival-after-hospitalization by linking two real data sets, the hospitalization reg-
isters and the national causes-of-death registers in some counties in Sweden. Such
technique is called record linkage [37], which finds the linkages among data en-
tries referring to the same real-world entities from different data sources. However,
in real applications, data is often incomplete or ambiguous. Consequently, record
linkages are often uncertain.
Probabilistic record linkages are often used to model the uncertainty. For two
records, a state-of-the-art probabilistic record linkage method [37, 38] can estimate
the probability that the two records refer to the same real-world entity. To illustrate,
consider some synthesized records in the two data sets as shown in Table 2.4. The
column probability is calculated by a probability record linkage method.
Two thresholds
δ M and
δ U are often used
(
0
δ U < δ M
1
)
: when the linkage
probability is less than
δ U , the records are considered not-matched; when the link-
age probability is between
δ M , the records are considered possibly matched;
and when the linkage probability is over
δ U and
δ M , the records are considered matched.
Many previous studies focus on building probabilistic record linkages effectively
and efficiently.
If a medical doctor wants to know, between John H. Smith and Johnson R. Smith ,
which patient died at a younger age. The doctor can set the two thresholds
δ M =
0
.
4
δ
=
.
δ
=
.
and
0
35 and compare the matched pairs of records. Suppose
0
4 and
U
M
δ
=
.
35 , then John H. Smith is matched to J. Smith , whose age is 61 , and Johnson
R. Smith is matched to J. R. Smith , whose age is 45 . Therefore, the medical doctor
concludes that Johnson R. Smith died at a younger age than John H. Smith .Isthe
answer correct?
0
U
 
Search WWH ::




Custom Search