Database Reference
In-Depth Information
applied to the default graph. The graph operator can be used to apply a pattern to
one or all of the named graphs. A named graph is referenced by an unique URI,
and for each graph that is used in the query, we need a pair ( URI , graph ) that speci-
fies where to find the corresponding RDF graph. If a variable is used in the Graph
operator instead of a specific graph URI, the pattern must be applied to all named
graphs.
As we want to execute SPARQL queries on large RDF graphs in a MapReduce
cluster, all graphs must be stored in the distributed file system. Applying a pattern
to one of the named graphs with Pig Latin simply means loading the corresponding
data.
P6. Persons in Graph graphURI Who Know Somebody
SP
Graph(graphURI, BGP(?a knows ?b))
PL
graph1 = LOAD 'pathToGraphURI'
USING RDFLOader() AS (s,p,o);
t1 = FILTER graph1 BY p == 'knows';
P6 = FOREACH t1 GENERATE s AS a, o AS b;
Joins and Null values. As we use at bags to represent solution mappings in
Pig Latin and all tuples of a bag have the same schema we use null values to
indicate that a variable is unbound in a solution mapping. This typically occurs
when using OPTIONAL to add additional information to a solution mapping.
The result of OPTIONAL is a set of solution mappings (i.e., a bag in Pig Latin)
where the optional variables can be unbound for some solution mappings (i.e.,
some tuples of the bag contain null values). However, this is problematic if the
further processing of the query requires a join over these possibly unbound vari-
ables. In SPARQL an unbound variable is compatible to any other binding of that
variable but since Pig Latin follows the relational algebra, a JOIN in Pig Latin is
null rejecting. Assume we have two bags of solution mappings R , S with schemas
(A,B) and (B,C) where R can contain null values for variable B as illustrated in
the following example.
R
AB
a
S
BC
bc
bc
ABC
abc
abc
abc
1
1
1
=
b
SPARQL
1
1
1
1
2
1
1
a
null
2
2
2
2
2
2
The second tuple of R is compatible to any tuple of S since variable B is unbound.
In Pig Latin, we would only get one tuple as join result since the second tuple of R
will not match with any tuple of S . To get the same result in Pig Latin we split R into
two bags (with and without null values) and process them separately, that is, we
perform an equi join for all tuples without null values and a crossproduct for the
tuples with null values.
Search WWH ::




Custom Search