Information Technology Reference
In-Depth Information
3 The Algorithm and Complexity Analysis
3.1 Graph Theory Notation and Terminology
Let
Σ
be the set of the alphabets appearing in the key words which includes
the special symbol “
” as the space character.
be a set of text documents for pattern de-
tection. Each document
D
i
is a sequence
d
i,
0
···
Let
D
=
{
D
1
, D
2
,
···
, D
n
}
d
i,t
i
consisting of alphabets
from the set
Σ
, where the first and the last character
d
i,
0
=
d
i,t
i
=
,
t
i
+1
is the length of the document
D
i
.
Let
K
=
{
K
1
, K
2
,
···
, K
m
}
be the set of selected keywords. For each
keyword
K
i
=
k
i,
0
···
k
i,s
i
, the first and the last character
k
i,
0
=
k
i,s
i
=
,
s
i
+ 1 is the length of the keyword
K
i
.
Let
G
=(
V, A
) be a directed graph with vertex set
V
and arc set
A
.
N
+
(
v
) is the set of all out-neighbors of the vertex
v
.Thatis,
N
+
(
v
)=
{
u
.
N
−
(
v
) is the set of all in-neighbors of the vertex
v
.Thatis,
N
−
(
v
)=
∈
V
(
G
):
vu
∈
A
(
G
)
}
{
u
∈
V
(
G
):
uv
∈
A
(
G
)
}
.
Σ
be a labeling of
A
(
G
).
L
+
(
v
)=
N
+
(
v
)
Let
L
:
A
(
G
)
→
{
l
(
vu
):
u
∈
}
.
3.2 Construction of Searching Tree
For the purpose of finding keywords eciently, we use the following algorithm
to set up a searching tree for keyword searching.
3.2.1 Algorithm
K
{
K
1
, K
2
,
···
, K
m
}
Input.
: a set of keywords.
Output.
A rooted tree (called “searching tree”)
T
:
T
has a root
v
0
and
m
leaves. Each of the leaf represents a keyword; each arc of
T
is labeled
with a character
=
Σ
; for each leaf
v
,let
P
be the unique directed path
from the root
v
0
to
v
, the sequence of labels along the path
P
coincides
with characters of the keywords
K
. (Figure 4 gives a simple example of a
searching tree. )
Initial step.
T
has a root
v
0
and a vertex
v
1
, and an arc
v
0
v
1
with the label
L
(
v
0
v
1
)=
∈
.
i
←
1:
i
is the keyword index (current keyword
K
i
=
k
i,
0
···
k
i,s
i
that is
under processing,
k
i,
0
=
k
i,s
i
=
,
s
i
+ 1 is the length of the keyword
K
i
.)
1:
λ
is the level index (the character
k
i,λ
is currently under processing,
and
λ
is also current level of the tree that is under construction).
v
λ
←
←
v
1
: (the current vertex whose out-neighborhood is under construc-
tion.)