A Data Stream View of Scientific Workflow - Grid Computing: Infrastructure, Service, and Applications

Information Technology Reference

In-Depth Information

investigated the space complexity of XPath evaluation on streams as a

function of the query size, and showed that the exponential dependence

is avoidable. An optimal algorithm whose memory depends only linearly

on the query size (for some types of queries, the dependence is even

logarithmic) has been exhibited. The other major source of memory

consumption has also been studied: buffers of (representations of) docu-

ment fragments. Algorithms that support advanced features of the XPath

language, such as predicates, fully l edged evaluation (as opposed to

i ltering only), or closure axes, face the need to store fragments of the

document stream during the evaluation. The buffering seems necessary

because, in many cases, at the time the algorithm encounters certain XML

elements in the stream, it does not have enough information to conclude

whether these elements should be part of the output or not (the decision

depends on unresolved predicates, whose i nal value is to be determined

by subsequent elements in the stream). Indeed, all the advanced evaluation

algorithms maintain some form of buffer (e.g., the stack of the XPush

Machine [29], the BPDT buffers of the XSQ system [25,26], the predicate

buffers of TurboXPath, and the buffer trees of the FluX query engine [35]).

It has been noted anecdotally that for certain queries and documents,

buffering seems unavoidable. However, to date, a formal and theoretical

study that quantii es the amount of buffering needed to support advanced

features of XPath has been demonstrated.

With respect to the advanced features of XPath, two major classes of

XPath evaluation problems necessitate buffering. Space lower bounds

quantify the amount of buffering required in terms of some document

properties such as nonrecursive document, star-free, and so on.

Bar-Yosseff et al. [33,37] investigated the upper two types of space

complexity of XPath evaluation on streams and proved, for any algorithm

A that evaluates a star-free XPath query Q on an XML streaming docu-

ment D, the minimum bits of space that A needs to use. These two types

of space complexity are so-called theoretic lower bounds for the query

processing over XML data streams. These two classes of evaluation prob-

lems are the following.

9.3.3.3.1

There are two typical modes of query evaluation: fully l edged evaluation

(i.e., outputting all the document elements selected by the query) and

i ltering (i.e., outputting a bit indicating whether the query selects any

node from the document or not) [29,30]. It has been demonstrated that

fully l edged evaluation of queries with predicates requires substantially

more space than just i ltering documents using such queries. This con-

currency lower bound is based on fully l edged evaluation. The lower

bound is stated in terms of a property of documents called “concurrency”

denoted as Ω(CONCUR(D,Q)). This lower bound is dei ned as the concept

of a concurrency.

Full-l edged Evaluation of Queries with Predicates

Grid Computing: Infrastructure, Service, and Applications

Search WWH ::

Custom Search

Home