Databases Reference
In-Depth Information
FIGURE 11.8
Probabilistic linkage.
Types of probabilistic links
There are multiple types of probabilistic links and depending on the data type and the relevance of
the relationships, we can implement one or a combination of linkage approaches with metadata and
master data.
Consider two texts: “long John is a better donut to eat” and “John Smith lives in Arizona.” If we
run a metadata-based linkage between them, the common word that is found is “John,” and the two
texts will be related where there is no probability of any linkage or relationship. This represents a
poor link, also called a weak link.
On the other hand, consider two other texts: “Blink University has released the latest winners list
for Dean's list, at deanslist.blinku.edu” and “Contact the Dean's staff via deanslist.blinku.edu.” The
email address becomes the linkage and can be used to join these two texts and additionally connect
the record to a student or dean's subject areas in the higher-education ERP platform. This represents
a strong link. The presence of a strong linkage between Big Data and the data warehouse does not
mean that a clearly defined business relationship exists between the environments; rather, it is indica-
tive of a type of join within some context being present.
Consider a text or an email:
From: John.Doe@yahoo.com
Subject: bill payment
Dear sir, we are very sorry to inform you that due to your poor customer service we are moving our
business elsewhere.
Regards, John Doe
 
Search WWH ::




Custom Search