Satisfiability of Simple XPath Fragments under 3/7/2016Fixed DTDs - Advances in Databases

Database Reference

In-Depth Information

The Glushkov automaton of a regular expression r is a 5-tuple G r

( Q, Σ r ,δ,q I ,F ), where Q = sym ( r # )

∪{q I }

is a set of states , Σ r is the set of

labels occurring in r , δ is a transition function , q I ∈ sym ( r # )isthe start state

of G r ,and F is a set of final states . Because of space limitation, the definitions

of δ and F are skipped (detains can be found in [2,9]). Let us show an example

instead. Figure 2(b) shows the Glushkov automaton G r

=( Q, Σ r ,δ,q I ,F )of

r =( a|b )( cb ) ∗ ,where Q =

{q I ,a 3 ,b 4 ,c 7 ,b 8

}

, Σ r =

{a, b, c}

, F =

{a 3 ,b 4 ,b 8

}

,and

δ is a transition function defined as follows.

δ ( q I ,a )= {a 3

δ ( q I ,b )=

{b 4

δ ( a 3 ,c )= δ ( b 4 ,c )= δ ( b 8 ,c )=

{c 7

δ ( c 7 ,b )=

{b 8

For any regular expression r , it holds that L ( r )= L ( G r ) and that there is a

one-to-one correspondence between sym ( r # )

\{q I }

and the set of states of G r .

This property is essential to our algorithms shown below.

4.2 Polynomial-Time Algorithm

We show a polynomial-time algorithm for solving SAT(XP {↓,↓ ∗ ,→ + ,← + } ) un-

der fixed DTDs. For a regular expression r ,by E ( r )wemeanthesetof ex-

tracted regular expressions obtained by extracting each '

'operatorthatis

not under any '

∗

' operator. For example, if r =( a|b ) ∗ c ( d|e|f ), then E ( r )=

{

Let us first show the outline of the algorithm. Let q = /ax [1] ::

lb [1] / ···/ax [ m ]:: lb [ m ]beaqueryand D =( d, s ) be a fixed DTD. The al-

gorithm uses the Glushkov automaton of the content model of each label in D .

In short, for each i =1 , ··· ,m , the algorithm computes the set of states of the

Glushkov automatons in D reachable from s via /ax [1] :: lb [1] / ···/ax [ i ]:: lb [ i ]

under D , and returns “no” (i.e., q is unsatisfiable) if the set becomes empty. More

concretely, the algorithm computes a set S i

( a|b ) ∗ cd, ( a|b ) ∗ ce, ( a|b ) ∗ cf}

for each i =1 , ··· ,m , and returns

“no” if S i =

for some i , returns “yes” otherwise. Here, S i is a set of pairs ( r, St ),

where r is an extracted regular expression of the “current” content model and

St is the set of states in G r

∅

reachable from s via /ax [1] :: lb [1] / ···/ax [ i ]:: lb [ i ]

under D .

Now let us show the “main” part of the algorithm. To obtain S i , according

to the current location step ax [ i ]:: lb [ i ] we use four subroutines in lines 2 to 9

(defined later).

Input: A query q = /ax [1] :: lb [1] / ···/ax [ m ]:: lb [ m ]andafixedDTD D =( d, s ).

Output: “yes” or “no”.

begin

for i =1 to m do

if ax [ i ]=' ↓ ' then

S i ← do child( D, q, i );

else if ax [ i ]=' ↓ ∗ ' then

Advances in Databases

Search WWH ::

Custom Search

Home