Database Reference
In-Depth Information
processing are episodic and elastic so that the trend is to leverage resources from the
cloud when possible. A number of cloud data-processing platforms have emerged in
recent times to support such applications, with many of them based on query pro-
cessing infrastructure similar to relational query engines. However, relational model
and algebra have some limitations with respect to the requirements of Semantic Web
processing—large number of joins, irregular structure, inferencing with querying,
and these limitations have an appreciable negative impact in MapReduce context.
This chapter reviews query evaluation techniques for graph pattern queries on
MapReduce platforms in terms of two query algebras: derivatives of relational
algebra and an alternative algebra called the Nested TripleGroup Data Model and
Algebra (NTGA) . It discusses the advantage of NTGA over relational-style query
plans and data representation, due to concurrent execution of “star joins,” which
reduces workflow length and enables shared table scans while keeping the footprint
of intermediate results minimized. The chapter presents some evaluation results that
show up to 60% performance advantage for relatively basic queries involving 2 to 3
star patterns. This advantage is expected to be even larger in more complex queries
with more star patterns because of the concurrent star-join execution enabled by
NTGA plans.
Ongoing and future work in NTGA optimization is focused on including neces-
sary extensions (logical and physical operators and query rewriting rules) to enable
translation of more complex graph pattern queries like graph patterns with unbound
properties, with optional fragments, ontological queries, and analytical queries to
NTGA. Some preliminary results for some of these more complex classes, specifi-
cally ontological queries, have shown up to orders of magnitude in performance
advantage and are thus very promising.
REFERENCES
1. Apache HBase. http://hbase.apache.org/.
2. Billion Triple Challenge. http://challenge.semanticweb.org/.
3. Open Science Data Cloud. https://www.opensciencedatacloud.org/.
4. SPARQL S-Expressions. http://jena.apache.org/documentation/notes/sse.html.
5. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and
Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS
Technologies for Analytical Workloads. Proc. VLDB , 2:922-933, 2009.
6. Foto N. Afrati and Jeffrey D. Ullman. Optimizing Multiway Joins in a Map-Reduce
Environment. Proc. TKDE , 23(9):1282-1298, 2011.
7. Kemafor Anyanwu, HyeongSik Kim, and Padmashree Ravindra. Algebraic Optimization
for Processing Graph Pattern Queries in the Cloud. IEEE Internet Comput. , 17(2):52-
61, 2013.
8. Medha Atre, Vineet Chaoji, Mohammed J. Zaki, and James A. Hendler. Matrix Bit
Loaded: A Scalable Lightweight Join Query Processor for RDF data. In Proc. Int. Conf.
World Wide Web , pp. 41-50, 2010.
9. Andrzej Bialecki, Michael Cafarella, Doug Cutting, and Owen O'Malley. Hadoop: A
Framework for Running Applications on Large Clusters Built of Commodity Hardware .
10. Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, and
Ruslan Velkov. OWLIM: A Family of Scalable Semantic Repositories. Semantic Web ,
2(1):33-42, 2011.
Search WWH ::




Custom Search