Introduction to Linked Data and Its Lifecycle on the Web - Reasoning Web

Databases Reference

In-Depth Information

5.1

Link Discovery

Linking can be generally defined as connecting things that are somehow related .Inthe

context of Linked Data, the idea of linking is especially concerned with establishing

typed links between entities (i.e., classes, properties or instances) contained in knowl-

edge bases. Over the last years, several frameworks have been developed to address

the lack of typed links between the di

ff

erent knowledge bases on the Linked Data web.

Overall, two main categories of frameworks that aim to achieve this goal can be dif-

ferentiated. The first category implements ontology matching techniques and aims to

establish links between the ontologies underlying two data sources. The second and

more prominent category of approaches, dubbed instance matching approaches (also

called linking or link discovery approaches), aims to discover links between instances

contained in two data sources. It is important to notice that while ontology and instance

matching are similar to schema matching [128,127] and record linkage [162,38,25] re-

spectively (as known in the research area of databases), linking on the Web of Data is

a more generic and thus more complex task, as it is not limited to finding equivalent

entities in two knowledge bases. Rather, it aims at finding semantically related entities

and establishing typed links between them, most of these links being imbued with for-

mal properties (e.g., transitivity, symmetry, etc.) that can be used by reasoners and other

application to infer novel knowledge. In this section, we will focus on the discovery of

links between instances and use the term link discovery as name for this process. An

overview of ontology matching techniques is given in [42].

Formally, link discovery can be defined as follows:

Definition 3 (Link Discovery). Given two sets S (source) and T (target) of instances,

a (complex) semantic similarity measure

σ

: S

×

T

→

[0

,

1] and a threshold

θ∈

[0

,

1] ,

={

,

,σ

,

≥θ}

the goal of link discovery task is to compute the set M

( s

t )

( s

t )

.

In general, the similarity function used to carry out a link discovery task is described by

using a link specification (sometimes called linkage decision rule [68]).

5.2

Challenges

Two key challenges arise when trying to discover links between two sets of instances:

the computational complexity of the matching task per se and the selection of an appro-

priate link specification. The first challenge is intrinsically related to the link discovery

process. The time complexity of a matching task can be measured by the number of

comparisons necessary to complete this task. When comparing a source knowledge

base S with a target knowledge base T , the completion of a matching task requires

a-priori O (

) comparisons, an impractical proposition as soon as the source and

target knowledge bases become large. For example, discovering duplicate cities in DB-

pedia [6] alone would necessitate approximately 0

|

S

||

T

|

10 9

.

15

×

similarity computations.

Hence, the provision of time-e

cient approaches for the reduction of the time com-

plexity of link discovery is a key requirement to instance linking frameworks for Linked

Data.

The second challenge of the link discovery process lies in the selection of an ap-

propriate link specification. The configuration of link discovery frameworks is usually

Reasoning Web

Search WWH ::

Custom Search

Home