Database Reference
In-Depth Information
Fig. 2. (a) Example document
Repeated patterns
t of Fig. 1 with repeated patterns replaced by non-terminals.
. (b)
By replacing each match
grammar, Grammar 3, whic
S r(k1(CDE(m(
h with a corresponding instantiation, we get the follow
ch is more compact than Grammar 2:
wing
)), k2(H
CDE(X) c(d(JB(X),e(
ε
,
ε
HGE(n(
ε
,
ε
)), k3(CDE(o(
ε
,
ε
)), k4(HGE(p(
ε
,
ε
)),
ε
)))),
ε
)
))
HGE(X) h(g(JB(X),e(ε,ε))
JB(X)
ε
,
ε
,
)
),ε)
ε
j(X,b(
ε
,
ε
))
Grammar 3: A gr
rammar sharing patterns by using parameterized rules
All terminal nodes excep
sibling. However, non-term
pt
have two parameters, i.e. the first-child and the ne
minal nodes may have an arbitrary number of parameters
ε
ext-
.
2.3 Node Selection by Gr
rammar Paths
The grammar path (GP):
D corresponds to exactly on
Intuitively, GP describes no
node, but also from where
For this purpose, GP conta
index positions within thes
calling the next grammar ru
terminal symbol correspond
by GP.
For example, if we appl
contains only the start rule
rule has been used, thus G
continues via k2's first-chi
called, and GP now is [S,
grammar rule for HGE(X),
the grammar rule for JB(X
complete GP. This gramma
the b-node selected by the q
A formal definition of g
in [12].
Each path to a selected node in any given XML docum
ne grammar path (GP) in the compressed grammar G of
ot only which grammar rules are called to find the selec
in a given grammar rule, the next grammar rule is cal
ains an alternating sequence of grammar rule names
se grammar rules of the occurrences of non-terminals
ule. Additionally, the last number in GP is the index of
ding to the selected node in the last grammar rule collec
ment
f D.
cted
led.
and
s Ni
the
cted
ly the query /k2//b to Grammar 3, GP is initially [S],
e. When k2 is found in the start rule S, no other gramm
GP is still [S]. When the search for a descendant b of
ild, the 2 nd non-terminal in the rule for S, i.e. HGE(X)
,2,HGE(X)]. Later, to find the first-child of g within
, the 1 st non-terminal, i.e. JB(X), is called. Finally, wit
X), we pick the 2 nd terminal symbol, i.e. the symbol b
ar path (GP), i.e. [S,2,HGE(X),1,JB(X) : 2], correspond
query /k2//b.
grammar paths however omitting the rule names is gi
i.e.
mar
f k2
), is
the
thin
b, to
ds to
iven
Search WWH ::




Custom Search