Using Linked Data - Linked Data: A Geographic Perspective

Database Reference

In-Depth Information

8.6.2 a utoMatic d iScovery and c reation

We use the terminology matching to mean seeking out equivalences, at either the

class or the instance level, versus linking , for which we are trying to relate two

classes/instances by some other relationship. Both of these cases are commonly

referred to as linking, but we feel the distinction is worth making. In some ways,

matching is the simpler case, so we address it first.

Currently, people use top-down matching rules, such as string-matching or hierar-

chical correspondence, to match two instances, along with various, usually heuristi-

cally derived, similarity measures. For example, when matching two places, we could

combine the result of string matching their two place names, with some distance mea-

sure between their latitude and longitudes; of course, this is made a lot easier if the two

datasets are using the same set of properties, so ontology matching is also required. An

alternative to try is bootstrapping , a bottom-up approach that uses a small set of manu-

ally matched or linked instances to derive a more general matching/linking rule for

similar cases. So, for example, if I have stated that mm:Mereashire owl:sameAs

dbpedia:MereaCounty and so on for several other counties, I could derive (a) that

mm:County owl:equivalentTo dbpedia:County , and (b) that it would be

worth doing string matching on other counties in the two sets of counties.

There are several tools that can assist with link discovery, for example, Silk, 12

which is a graphical tool for identifying links between one RDF dataset and

another on the Linked Data Web. Another tool that may be of use is the Linked

Data Integration Framework, 13 which works with a Silk link-mapping specifica-

tion and handles the disparities that can occur when some datasets are RDF/XML

dumps only, while others are offered via SPARQL endpoints. The LIMES 14 (Link

Discovery for Metric Spaces) tool has both a stand-alone option and a Web interface

that works with SPARQL endpoints. LIMES works by finding a set of examples in

the target dataset and matching each of the instances in that target dataset to their

nearest example. Next, the distance between each target example and all the source

instances is calculated, and any obvious mismatches (which have a large distance)

are filtered out. Then, the actual distances between the source instances and the

most likely target instances are calculated. This approach reduces the search space

and number of similarity calculations that have to be carried out. Finally, the source

and target instances with the highest similarity are output in N-Triples format.

Another approach to link discovery is to use Bayesian belief networks, for example,

the RiMOM 15 (Risk Minimization-Based Ontology Mapping) tool; however, this is

limited to demonstration with a benchmark dataset only.

At the time of writing, automated link discovery was still a very immature area and

the subject of ongoing research. Most of the tools described have significant limita-

tions with accuracy, scale, or robustness and are for the most part still emerging from

the universities where they were developed. They are therefore not yet mature enough

to offer commercial-quality solutions to the problem of link creation. Nevertheless,

they are indicative of how the technology is developing. For specific datasets, the

advice is still to write one's own link discovery scripts based on knowledge of the

datasets as this will produce higher accuracy than these more general tools.

Linked Data: A Geographic Perspective

Search WWH ::

Custom Search

Home