Publishing Linked Data - Linked Data: A Geographic Perspective

Database Reference

In-Depth Information

contains. There are a number of RDF features that should be avoided when creating

Linked Data. These are reification, RDF containers and collections, and blank

nodes. Reified statements are difficult to query with SPARQL (the RDF query lan-

guage explained in Chapter 8). It is better to add metadata to the Web document that

contains the triples instead. RDF collections and containers also cannot be queried

in SPARQL, so if the relative ordering of items in a set is significant, add in multi-

ple triples with the same subject URI and predicate and then add additional triples

between the object URIs to explicitly describe the sequence information. Blank

nodes cannot be linked to from outside the document in which they appear as their

scope is limited to that document, and they pose a problem when data from different

sources is merged as they cannot be referenced by a URI. So, the recommendation is

to name every resource in the dataset with an explicit URI.

7.5

LINKED DATA GENERATION

7.5.1 p laIntext D ata S oUrceS

Although this is less common in a purely GI environment, there are many scenarios

for which the contents of text documents need to be converted to Linked Data, for

example, news stories, patents, or historical records, and these, like many informa-

tion sources, are likely to include some references to location as well. A tool such as

Open Calais 5 or Ontos Miner 6 (which can be applied to a number of languages other

than English) can identify the “entities”—the main people, organizations, places,

objects, and events in the text—using various natural language-processing and

machine-learning techniques and assign URIs to them. However, a word of warning

here: These tools get it wrong a lot of the time; typically, precision rates are only

around 80%, so they should not be used without manual verification. Usually, the

resulting RDF is embedded as RDFa metadata alongside the text as it is published

on the Web (as explained in Section 7.5.6), making the text documents more easily

discoverable and enabling faceted browsing. However, it is equally possible for the

RDF extracted from the plaintext to be stored in a triple store that is published to the

Web or simply published as a static RDF/XML file.

7.5.2

S trUctUreD D ata S oUrceS

In contrast to plaintext documents, GI data is more usually accessible in some struc-

tured format, for example, CSV (comma-separated values), XML, or even an Excel

spreadsheet. If the data from the GIS can be output as comma-separated files, code

can be written in scripting languages such as Perl to convert to RDF/XML structure.

If the GI is in an XML-based format, XSLT transformations are possible instead.

There are several “RDF-izer” tools available to assist in this process. The tools

usually convert the original structured data format to a static RDF file or load the

RDF data into a triple store. These include Excel/CSV converters from Cambridge

Semantics, 7 Topb r a id , 8 a nd X LWr ap. 9 A more comprehensive list of RDF conversion

tools has been collected by the W3C and is available on its wiki. 10

Search WWH ::

Custom Search

Home