Databases Reference
In-Depth Information
C5 Linked Data needs more than RDFS and OWL . There is more implicit knowl-
edge hidden in Linked Data than can be captured with the semantics of
RDFS and OWL alone; in fact, it may be considered quite unintuitive that
Query 1 from p. 101 does not return IBM's revenue: the exchange rate be-
tween USD and EUR is itself available as data on the Web, so why shouldn't
the Web of Data be able to make use of this knowledge? However, ontology
languages like OWL and RDFS do not provide means to express mathemati-
cal conversions as necessary in this example. While not solvable with current
Linked Data standards and techniques, we discuss a possible approach to
tackle this problem in Section 7.
4 How Much OWL Is Needed for Linked Data?
Given the variety of combinations of techniques and RDFS/OWL profiles that
can be applied for reasoning over Linked Data, an obvious question to ask is
which features of RDFS and OWL are most prominently used on the current
Web of Data ?
Knowing which features are frequently used and which are infrequently used
provides insights into how appropriate the coverage of various OWL profiles
might be for the Linked Data use-case, and in particular, the relative costs
of supporting or not supporting the semantics of a certain language primitive
depending on its adoption in Web data for the setting of a given architectural
choice. For example, a language feature that is costly to support or that otherwise
adds complexity to a reasoning algorithm could potentially be “turned off”, with
minimal practical effect, if it is found to be very infrequently used in real-world
data.
In this section, we thus look at the features of RDFS and OWL that are
most/least widely adopted on the Web of Data. For the purposes of this study,
we take the Billion Triple Challenge 2011 corpus, which consists of 2.145 billion
quadruples crawled from 7.411 million RDF/XML documents through an open
crawl ran in May/June 2011 spanning 791 pay-level domains. 20 This corpus
represents a broad sample of the Web of Data. We then look into the levels of
adoption of individual RDFS and OWL features within this broad corpus.
4.1 Measures Used
In order to adequately characterise the uptake of various RDF(S) and OWL
features used in this corpus, we present different measures to quantify their
prevalence and prominence .
First, we look at the prevalence of use of different features, i.e., how often
they are used. Here, we must take into account the diversity of the data under
analysis, where few domains account for many documents and many domains
account for few documents, and so forth [38]. We thus present two simple metrics:
20 A pay-level domain is a direct sub-domain of a top-level domain (TLD) or a second-
level country domain (ccSLD), e.g., dbpedia.org , bbc.co.uk . This gives us our no-
tion of “domain”.
 
Search WWH ::




Custom Search