Databases Reference
In-Depth Information
from the Web and typically use rule-based materialisation (as presented in
Section 10) to cautiously infer additional RDF triples; Section 5 will discuss
such cautious reasoning techniques that are tailored to not infer too much
information in this setting.
2. Rather than relying on a centralised index, Linked Data itself can be viewed
as a database that can be queried directly and dynamically [33]. That is,
starting from the URIs appearing in the query, or from a set of seed URIs,
query-relevant data is navigated following Linked Data principles and re-
trieved dynamically. The main advantage of this approach is that results
are much fresher and all query-relevant data do not need to be known lo-
cally. However, the main weaknesses of this approach are that performing
remote lookups at query-time is costly, potentially a lot intermediate data
are transferred, and the recall depends significantly on the seed URIs and
conformance of the query-relevant sources with Linked Data principles. In
Section 6, we will present such an approach and discuss how reasoning can
be incorporated.
Getting back to the challenges enumerated in the introduction, let us now briefly
discuss how these affect the architectural choice for a particular reasoning and
querying infrastructure.
C1 Linked Data is huge . Our example contains only sample data from two ex-
ample datasets in the Linked Data Web. The most recent incarnation of the
Linking Open Data cloud (from September 2011), is claimed to represent
over 31 billion triples spread across 295 datasets. 16 It is obvious that staying
aware of and gathering this dynamically evolving data for query process-
ing is an issue in terms of scale, but also in terms of deciding what parts
of which datasets are relevant for the query at hand. For example, broad
queries like Query 5, without further scope, are notoriously hard to answer,
since instances of foaf:Person are spread over various datasets and individ-
ual RDF files spread right across the Web: while data-warehouses probably
do not provide complete results on such queries, on-the-fly-traversal based
approaches in the worst case can't answer such queries at all, or depending
on the seed URIs, cause prohibitively many lookups during query processing.
C2 Linked Data is not “pure” OWL . When all usage of the rdfs: and owl:
vocabulary is within the “mappable” fragment for OWL (see, e.g., Table 1),
the RDF graph in question is interpretable under the OWL Direct Seman-
tics. However, arbitrary RDF published on the Web does not necessarily fall
within these bounds. For example, the FOAF vocabulary defines inverse-
functional datatype properties such as foaf:mbox_sha1sum , which is disal-
lowed by the restrictions under which the Direct Semantics is defined. Even
worse, one may find “harmful” RDF published online that makes reason-
16 While the LOD cloud was not updated since then, the source it is based on -
http://datahub.io/group/lodcloud - listed 337 LOD datasets at the time of
writing.
 
Search WWH ::




Custom Search