Database Reference
In-Depth Information
contains. There are a number of RDF features that should be avoided when creating
Linked Data. These are reification, RDF containers and collections, and blank
nodes. Reified statements are difficult to query with SPARQL (the RDF query lan-
guage explained in Chapter 8). It is better to add metadata to the Web document that
contains the triples instead. RDF collections and containers also cannot be queried
in SPARQL, so if the relative ordering of items in a set is significant, add in multi-
ple triples with the same subject URI and predicate and then add additional triples
between the object URIs to explicitly describe the sequence information. Blank
nodes cannot be linked to from outside the document in which they appear as their
scope is limited to that document, and they pose a problem when data from different
sources is merged as they cannot be referenced by a URI. So, the recommendation is
to name every resource in the dataset with an explicit URI.
7.5
LINKED DATA GENERATION
7.5.1 p laIntext D ata S oUrceS
Although this is less common in a purely GI environment, there are many scenarios
for which the contents of text documents need to be converted to Linked Data, for
example, news stories, patents, or historical records, and these, like many informa-
tion sources, are likely to include some references to location as well. A tool such as
Open Calais 5 or Ontos Miner 6 (which can be applied to a number of languages other
than English) can identify the “entities”—the main people, organizations, places,
objects, and events in the text—using various natural language-processing and
machine-learning techniques and assign URIs to them. However, a word of warning
here: These tools get it wrong a lot of the time; typically, precision rates are only
around 80%, so they should not be used without manual verification. Usually, the
resulting RDF is embedded as RDFa metadata alongside the text as it is published
on the Web (as explained in Section 7.5.6), making the text documents more easily
discoverable and enabling faceted browsing. However, it is equally possible for the
RDF extracted from the plaintext to be stored in a triple store that is published to the
Web or simply published as a static RDF/XML file.
7.5.2
S trUctUreD D ata S oUrceS
In contrast to plaintext documents, GI data is more usually accessible in some struc-
tured format, for example, CSV (comma-separated values), XML, or even an Excel
spreadsheet. If the data from the GIS can be output as comma-separated files, code
can be written in scripting languages such as Perl to convert to RDF/XML structure.
If the GI is in an XML-based format, XSLT transformations are possible instead.
There are several “RDF-izer” tools available to assist in this process. The tools
usually convert the original structured data format to a static RDF file or load the
RDF data into a triple store. These include Excel/CSV converters from Cambridge
Semantics, 7 Topb r a id , 8 a nd X LWr ap. 9 A more comprehensive list of RDF conversion
tools has been collected by the W3C and is available on its wiki. 10
Search WWH ::




Custom Search