Database Reference
In-Depth Information
The Glushkov automaton of a regular expression r is a 5-tuple G r
=
( Q, Σ r ,δ,q I ,F ), where Q = sym ( r # )
∪{q I }
is a set of states , Σ r is the set of
labels occurring in r , δ is a transition function , q I ∈ sym ( r # )isthe start state
of G r ,and F is a set of final states . Because of space limitation, the definitions
of δ and F are skipped (detains can be found in [2,9]). Let us show an example
instead. Figure 2(b) shows the Glushkov automaton G r
=( Q, Σ r ,δ,q I ,F )of
r =( a|b )( cb ) ,where Q =
{q I ,a 3 ,b 4 ,c 7 ,b 8
}
, Σ r =
{a, b, c}
, F =
{a 3 ,b 4 ,b 8
}
,and
δ is a transition function defined as follows.
δ ( q I ,a )= {a 3
},
δ ( q I ,b )=
{b 4
},
δ ( a 3 ,c )= δ ( b 4 ,c )= δ ( b 8 ,c )=
{c 7
},
δ ( c 7 ,b )=
{b 8
}.
For any regular expression r , it holds that L ( r )= L ( G r ) and that there is a
one-to-one correspondence between sym ( r # )
\{q I }
and the set of states of G r .
This property is essential to our algorithms shown below.
4.2 Polynomial-Time Algorithm
We show a polynomial-time algorithm for solving SAT(XP {↓,↓ ,→ + ,← + } ) un-
der fixed DTDs. For a regular expression r ,by E ( r )wemeanthesetof ex-
tracted regular expressions obtained by extracting each '
|
'operatorthatis
not under any '
' operator. For example, if r =( a|b ) c ( d|e|f ), then E ( r )=
{
.
Let us first show the outline of the algorithm. Let q = /ax [1] ::
lb [1] / ···/ax [ m ]:: lb [ m ]beaqueryand D =( d, s ) be a fixed DTD. The al-
gorithm uses the Glushkov automaton of the content model of each label in D .
In short, for each i =1 , ··· ,m , the algorithm computes the set of states of the
Glushkov automatons in D reachable from s via /ax [1] :: lb [1] / ···/ax [ i ]:: lb [ i ]
under D , and returns “no” (i.e., q is unsatisfiable) if the set becomes empty. More
concretely, the algorithm computes a set S i
( a|b ) cd, ( a|b ) ce, ( a|b ) cf}
for each i =1 , ··· ,m , and returns
“no” if S i =
for some i , returns “yes” otherwise. Here, S i is a set of pairs ( r, St ),
where r is an extracted regular expression of the “current” content model and
St is the set of states in G r
reachable from s via /ax [1] :: lb [1] / ···/ax [ i ]:: lb [ i ]
under D .
Now let us show the “main” part of the algorithm. To obtain S i , according
to the current location step ax [ i ]:: lb [ i ] we use four subroutines in lines 2 to 9
(defined later).
Input: A query q = /ax [1] :: lb [1] / ···/ax [ m ]:: lb [ m ]andafixedDTD D =( d, s ).
Output: “yes” or “no”.
begin
1.
for i =1 to m do
2.
if ax [ i ]=' ' then
3.
S i do child( D, q, i );
else if ax [ i ]=' ' then
4.
 
Search WWH ::




Custom Search