Information Technology Reference
In-Depth Information
Web of Linked Data. Yet, so far, only little attention has been paid to the effect
of links between datasets on federated querying.
In this paper, we presents LAW, a link-aware approach to source selection
for federated querying over the Web of Data. We redefine the RDF graph as
the RDF triple link graph to reveal links between triples in one single dataset
or multiple datasets. We also define basic graph patterns in SPARQL as triple
pattern link graph to reveal links between triple patterns. To bridge the gap of
triple link graphs and triple pattern link graphs, we design a special statisti-
cal model called property link graph to approximate links between real linked
data. Moreover, LAW also provides a distributed join execution mechanism that
minimises network trac during executing selection plans.
Our main contribution presented in this paper is threefold. (1) We formalize
the RDF triple link graph and triple pattern graph. (2) We propose an ecient
approach of source selection. (3) We perform a comprehensive simulation study
based on the real dataset to evaluate our approaches.
The remainder of this paper is structured as follows. In Section 2 we review
related works. In Section 3 we present the background knowledge. Section 4
describes the statistical model. Source selection and the execution of selection
plans are presented in Section 5. An evaluation of our approach is given in Section
6. Finally, we conclude and discuss future directions in Section 7.
2 Related Works
DARQ [8] extends the popular query processor Jena ARQ to an engine for fed-
erated SPARQL queries. It requires users to explicitly supply a configuration
file which enables the query engine to decompose a query into sub-queries and
optimize joins based on predicate selectivity. SemWIQ [6] requires all subjects
must be variables and for each subject variable its type must be explicitly or
implicitly defined. Additional information (another triple pattern or DL con-
straints) is needed to tell the type for the subject of a triple pattern. It uses
these additional information and extensive RDF statistics to decompose the
original user query. DARQ [8] and SemWIQ [6] potentially assume that RDF
triples are independent from each other: if the property or subject class of one
triple pattern is defined by one dataset, then they are relevant. FedX[10] also po-
tentially adopts triple independency assumption. It asks all known data sources
by SPARQL ASK query form whether they contain matched data for each triple
pattern presented in a user query. FedSearch[7] is based on FedX and extends it
with sophisticated static optimization strategies. If the amount of known data
sources is very large(it is common in an open setting), the query performance
may leave much to be desired. SPLENDID [5] relies on the VOID descriptions
existing in remote data sources. However, a VOID description is not an integral
part of Linked Data principles[1].
In other cases, users are required to provide additional information to de-
termine the relevant data sources. For instance, [13] theoretically describes a
solution called Distributed SPARQL for distributed SPARQL query on the top
 
Search WWH ::




Custom Search