RDFS and OWL Reasoning for Linked Data - Reasoning Web

Databases Reference

In-Depth Information

Doc: number of documents using the language feature

Dom: number of pay-level-domains (i.e., sites) using the language feature.

However, raw counts do not reflect the reality that the use of an OWL fea-

ture in one important ontology or vocabulary may often have greater practical

impact than use in a thousand obscure documents. Thus, we also look at the

prominence of use of different features. We use PageRank to quantify our notion

of prominence: PageRank calculates a variant of the Eigenvector centrality of

nodes (e.g., documents) in a graph, where taking the intuition of directed links

as “positive votes”, the resulting scores help characterise the relative prominence

(i.e., centrality) of particular documents on the Web [55,31].

In particular, we first rank documents in the corpus. To construct the graph,

we follow Linked Data principles and consider sources as nodes, where a directed

edge ( s 1 ,s 2 )

S is extended from source s 1 to s 2 iff get ( s 1 ) contains (in

any triple position) a URI that dereferences to document s 2 (i.e., there exists

a u

∈

S

×

terms ( get ( s 1 )) such that redirs ( u )= s 2 ). We also prune edges to only

consider ( s 1 ,s 2 )when s 1 and s 2 are non-empty sources in our corpus. We then

apply a standard PageRank analysis over the resulting directed graph, using

the power iteration method with ten iterations. For reasons of space, we refer

the interested reader to [55] for more detail on PageRank, and to the following

thesis [38] for more details on the particular algorithms used for this paper.

With PageRank scores computed for all documents in the corpus, for each

RDFS and OWL language feature, we then present:

Rank the sum of PageRank scores for documents in which the language

feature is used.

With respect to Rank , under the random surfer model of PageRank [55],

given an agent starting from a random location and traversing documents on

(our sample of) the Web of Data through randomly selected dereferenceable

URIs, the Rank

∈

value for a feature approximates the probability with which

that agent will be at a document using that feature after traversing ten links.

In other words, the score indicates the likelihood of an agent, operating over the

Web of Data based on dereferenceable principles, to encounter a given feature

during a random walk.

The graph extracted from the corpus consists of 7.411 million nodes and 198.6

million edges. Table 4 presents the top-10 ranked documents in our corpus, which

are dominated by core meta-vocabularies, documents linked therefrom, and other

popular vocabularies. 21

4.2 Survey of RDF(S)/OWL Features

Table 5 presents the results of the survey of RDF(S) and OWL usage in our

corpus, where for features with non-trivial semantics, we present the measures

21 We ran another similar analysis with links to and from core RDF(S) and OWL

vocabularies disabled. The results for the feature analysis remained similar. Mainly

owl:sameAs dropped several positions in terms of the sum of PageRank.

Reasoning Web

Search WWH ::

Custom Search

Home