Database Reference
In-Depth Information
20
>
respectively, an overlap is detected, and hence a lookahead is done in both sets. The
lookahead determines that the next tuple from the right set (D2
<
21, 24
>
) ends before
the next tuple from the left set (D2
<
30, 35
>
). Hence, D2
<
21, 24
>
is made the new
terminator
and D2
<
12, 18
>
is retained as the initiator. They are combined to form the
output tuple D2
<
12, 24
>
.Now,
initiator
points to a D2 tuple while
terminator
points
to a D3 tuple. Hence,
initiator
is advanced. Now,
initiator
(D3
<
40, 47
>
) lies com-
pletely after
terminator
(D3
<
12, 19
>
). Hence,
initiator
and
terminator
are swapped.
This makes
initiator
point to D3
<
12, 19
>
and
terminator
point to D3
<
40, 47
>
,which
form a proximal pair and are merged to give D3
<
12, 47
>
in the output set. Finally,
ini-
tiator
points to D4
<
12, 20
>
,and
terminator
points to D4
<
60, 80
>
. In this case, the
distance between them is 40, which is greater than the maximum allowed distance, i.e.,
30. Hence, they are not combined, and a lookahead needs to be done to determine which
one of them should be discarded, and which one kept.
User
InfoSearch
PSL query
Pattern
Validator
Results
Pattern
Processor
{keyword1,
keyword2,...}
Graph
Generator
Keyword buffer
tuples (hits)
Index Interface
PDG
WordNet
database
Pattern Detector
<keyword1, URL1,
position>
.
.
.
<keywordn, URLn,
position>
Inverted index
Document
collection
Fig. 6.
InfoSearch architecture
4
Design and Implementation of InfoSearch
The InfoSearch architecture is shown in Figure 6. The user query specified in Pattern
Specification Language is converted into a Pattern detection graph (or PDG). Leaf nodes
of the PDG represent simple patterns such as keywords, phrases or system defined pat-
terns. Higher level nodes represent composite operators on these leaf nodes, or on other
composite nodes. To detect and optimize common computations, the
graph generator
shares PDG nodes (and sub-graphs) wherever possible. This is achieved by generating