Database Reference
In-Depth Information
The
Glushkov automaton
of a regular expression
r
is a 5-tuple
G
r
=
(
Q, Σ
r
,δ,q
I
,F
), where
Q
=
sym
(
r
#
)
∪{q
I
}
is a set of
states
,
Σ
r
is the set of
labels occurring in
r
,
δ
is a
transition function
,
q
I
∈ sym
(
r
#
)isthe
start state
of
G
r
,and
F
is a set of
final states
. Because of space limitation, the definitions
of
δ
and
F
are skipped (detains can be found in [2,9]). Let us show an example
instead. Figure 2(b) shows the Glushkov automaton
G
r
=(
Q, Σ
r
,δ,q
I
,F
)of
r
=(
a|b
)(
cb
)
∗
,where
Q
=
{q
I
,a
3
,b
4
,c
7
,b
8
}
,
Σ
r
=
{a, b, c}
,
F
=
{a
3
,b
4
,b
8
}
,and
δ
is a transition function defined as follows.
δ
(
q
I
,a
)=
{a
3
},
δ
(
q
I
,b
)=
{b
4
},
δ
(
a
3
,c
)=
δ
(
b
4
,c
)=
δ
(
b
8
,c
)=
{c
7
},
δ
(
c
7
,b
)=
{b
8
}.
For any regular expression
r
, it holds that
L
(
r
)=
L
(
G
r
) and that there is a
one-to-one correspondence between
sym
(
r
#
)
\{q
I
}
and the set of states of
G
r
.
This property is essential to our algorithms shown below.
4.2 Polynomial-Time Algorithm
We show a polynomial-time algorithm for solving SAT(XP
{↓,↓
∗
,→
+
,←
+
}
) un-
der fixed DTDs. For a regular expression
r
,by
E
(
r
)wemeanthesetof
ex-
tracted regular expressions
obtained by extracting each '
|
'operatorthatis
not under any '
∗
' operator. For example, if
r
=(
a|b
)
∗
c
(
d|e|f
), then
E
(
r
)=
{
.
Let us first show the outline of the algorithm. Let
q
=
/ax
[1] ::
lb
[1]
/ ···/ax
[
m
]::
lb
[
m
]beaqueryand
D
=(
d, s
) be a fixed DTD. The al-
gorithm uses the Glushkov automaton of the content model of each label in
D
.
In short, for each
i
=1
, ··· ,m
, the algorithm computes the set of states of the
Glushkov automatons in
D
reachable from
s
via
/ax
[1] ::
lb
[1]
/ ···/ax
[
i
]::
lb
[
i
]
under
D
, and returns “no” (i.e.,
q
is unsatisfiable) if the set becomes empty. More
concretely, the algorithm computes a set
S
i
(
a|b
)
∗
cd,
(
a|b
)
∗
ce,
(
a|b
)
∗
cf}
for each
i
=1
, ··· ,m
, and returns
“no” if
S
i
=
for some
i
, returns “yes” otherwise. Here,
S
i
is a set of pairs (
r, St
),
where
r
is an extracted regular expression of the “current” content model and
St
is the set of states in
G
r
∅
reachable from
s
via
/ax
[1] ::
lb
[1]
/ ···/ax
[
i
]::
lb
[
i
]
under
D
.
Now let us show the “main” part of the algorithm. To obtain
S
i
, according
to the current location step
ax
[
i
]::
lb
[
i
] we use four subroutines in lines 2 to 9
(defined later).
Input: A query
q
=
/ax
[1] ::
lb
[1]
/ ···/ax
[
m
]::
lb
[
m
]andafixedDTD
D
=(
d, s
).
Output: “yes” or “no”.
begin
1.
for
i
=1
to
m
do
2.
if
ax
[
i
]='
↓
'
then
3.
S
i
←
do child(
D, q, i
);
else if
ax
[
i
]='
↓
∗
'
then
4.