Databases Reference
In-Depth Information
5.1
Link Discovery
Linking can be generally defined as connecting things that are somehow related .Inthe
context of Linked Data, the idea of linking is especially concerned with establishing
typed links between entities (i.e., classes, properties or instances) contained in knowl-
edge bases. Over the last years, several frameworks have been developed to address
the lack of typed links between the di
ff
erent knowledge bases on the Linked Data web.
Overall, two main categories of frameworks that aim to achieve this goal can be dif-
ferentiated. The first category implements ontology matching techniques and aims to
establish links between the ontologies underlying two data sources. The second and
more prominent category of approaches, dubbed instance matching approaches (also
called linking or link discovery approaches), aims to discover links between instances
contained in two data sources. It is important to notice that while ontology and instance
matching are similar to schema matching [128,127] and record linkage [162,38,25] re-
spectively (as known in the research area of databases), linking on the Web of Data is
a more generic and thus more complex task, as it is not limited to finding equivalent
entities in two knowledge bases. Rather, it aims at finding semantically related entities
and establishing typed links between them, most of these links being imbued with for-
mal properties (e.g., transitivity, symmetry, etc.) that can be used by reasoners and other
application to infer novel knowledge. In this section, we will focus on the discovery of
links between instances and use the term link discovery as name for this process. An
overview of ontology matching techniques is given in [42].
Formally, link discovery can be defined as follows:
Definition 3 (Link Discovery). Given two sets S (source) and T (target) of instances,
a (complex) semantic similarity measure
σ
: S
×
T
[0
,
1] and a threshold
θ∈
[0
,
1] ,
={
,
,
≥θ}
the goal of link discovery task is to compute the set M
( s
t )
( s
t )
.
In general, the similarity function used to carry out a link discovery task is described by
using a link specification (sometimes called linkage decision rule [68]).
5.2
Challenges
Two key challenges arise when trying to discover links between two sets of instances:
the computational complexity of the matching task per se and the selection of an appro-
priate link specification. The first challenge is intrinsically related to the link discovery
process. The time complexity of a matching task can be measured by the number of
comparisons necessary to complete this task. When comparing a source knowledge
base S with a target knowledge base T , the completion of a matching task requires
a-priori O (
) comparisons, an impractical proposition as soon as the source and
target knowledge bases become large. For example, discovering duplicate cities in DB-
pedia [6] alone would necessitate approximately 0
|
S
||
T
|
10 9
.
15
×
similarity computations.
Hence, the provision of time-e
cient approaches for the reduction of the time com-
plexity of link discovery is a key requirement to instance linking frameworks for Linked
Data.
The second challenge of the link discovery process lies in the selection of an ap-
propriate link specification. The configuration of link discovery frameworks is usually
 
Search WWH ::




Custom Search