Database Reference
In-Depth Information
More recent work on record linkages in different scenarios includes [129, 130,
131, 132].
3.6.1.1 How Is Our Study Related?
In this topic, we do not propose any record linkage methods. Instead, we focus
on how to use probabilistic linkages produced by the existing probabilistic record
linkage methods to answer aggregate queries in a meaningful and efficient way.
As illustrated in Example 2.12, traditional post-processing methods that transform
the probabilistic linkages into deterministic matches using thresholds may gener-
ate misleading results. Moreover, all existing record linkage methods only return
linkage probabilities independently. There are no existing methods that output joint
distributions. Therefore, deriving possible probabilities is far from trivial as will be
shown in Chapter 7.
3.6.2 Probabilistic Graphical Models
Probabilistic graphical models refer to graphs describing dependencies among ran-
dom variables. Generally, there are two types of probabilistic graphical models: di-
rected graphical models [133] (also known as Bayesian networks or belief networks)
and undirected graphical models [134] (also known as Markov networks or Markov
random fields).
In directed graphical models, a vertex represents a random variable. A directed
edge from vertex
X
a
to vertex
X
b
represents that the probability distribution of
X
b
is
conditional on that of
X
a
.
In undirected graphical models, an edge between two random variables repre-
sents the dependency between the variables without particular directions. A random
variable
X
a
is independent to all variables that are not adjacent to
X
a
conditional on
all variables adjacent to
X
a
.
In an undirected graphical model, the joint probability distribution of the random
variables can be factorized by the marginal distributions of the cliques in the graph,
if the graph does not contain a loop of more than 3 vertices that is not contained in
a clique [135].
3.6.2.1 How Is Our Study Related?
In this topic, we develop PME-graphs as a specific type of undirected graphical
models. We exploit the special properties of PME-graphs beyond the general undi-
rected graphical models, and study the factorization of the joint probabilities in
PME-graphs. Moreover, we develop efficient methods to evaluate aggregate queries
on linkages using PME-graphs.