Information Technology Reference
In-Depth Information
investigated the space complexity of XPath evaluation on streams as a
function of the query size, and showed that the exponential dependence
is avoidable. An optimal algorithm whose memory depends only linearly
on the query size (for some types of queries, the dependence is even
logarithmic) has been exhibited. The other major source of memory
consumption has also been studied: buffers of (representations of) docu-
ment fragments. Algorithms that support advanced features of the XPath
language, such as predicates, fully l edged evaluation (as opposed to
i ltering only), or closure axes, face the need to store fragments of the
document stream during the evaluation. The buffering seems necessary
because, in many cases, at the time the algorithm encounters certain XML
elements in the stream, it does not have enough information to conclude
whether these elements should be part of the output or not (the decision
depends on unresolved predicates, whose i nal value is to be determined
by subsequent elements in the stream). Indeed, all the advanced evaluation
algorithms maintain some form of buffer (e.g., the stack of the XPush
Machine [29], the BPDT buffers of the XSQ system [25,26], the predicate
buffers of TurboXPath, and the buffer trees of the FluX query engine [35]).
It has been noted anecdotally that for certain queries and documents,
buffering seems unavoidable. However, to date, a formal and theoretical
study that quantii es the amount of buffering needed to support advanced
features of XPath has been demonstrated.
With respect to the advanced features of XPath, two major classes of
XPath evaluation problems necessitate buffering. Space lower bounds
quantify the amount of buffering required in terms of some document
properties such as nonrecursive document, star-free, and so on.
Bar-Yosseff et al. [33,37] investigated the upper two types of space
complexity of XPath evaluation on streams and proved, for any algorithm
A that evaluates a star-free XPath query Q on an XML streaming docu-
ment D, the minimum bits of space that A needs to use. These two types
of space complexity are so-called theoretic lower bounds for the query
processing over XML data streams. These two classes of evaluation prob-
lems are the following.
9.3.3.3.1
There are two typical modes of query evaluation: fully l edged evaluation
(i.e., outputting all the document elements selected by the query) and
i ltering (i.e., outputting a bit indicating whether the query selects any
node from the document or not) [29,30]. It has been demonstrated that
fully l edged evaluation of queries with predicates requires substantially
more space than just i ltering documents using such queries. This con-
currency lower bound is based on fully l edged evaluation. The lower
bound is stated in terms of a property of documents called “concurrency”
denoted as Ω(CONCUR(D,Q)). This lower bound is dei ned as the concept
of a concurrency.
Full-l edged Evaluation of Queries with Predicates
Search WWH ::




Custom Search