Database Reference
In-Depth Information
and position of sub-patterns is known, can a decision be made whether they can be
combined or not.
The inputs to the operators are sets of tuples containing the document ID, start offset,
and end offset of the corresponding pattern. Each tuple represents a single occurrence
of the corresponding pattern in the document collection. It is assumed that these sets
of tuples are sorted in ascending order of document ID and by start offset within each
document ID. The operators have to process the input sets tuple by tuple. First, they
have to ensure that the tuples to be merged have the same document ID. Second, they
have to determine which tuple is the initiator and which one is the terminator. Tuples
satisfying the criteria of the operator are combined and added to an output set. After the
operator is done processing the input sets, the output set is propagated to its parent.
Due to space limitations, we discuss only the NEAR operator. Please refer to [9] for
other operator and algorithm discussions.
3.1
The NEAR Operator
When the NEAR operator is processing two tuples from the input sets, it has to make a
decision whether the tuples are eligible for combination, and if not, decide which one
to keep and which one to discard. As mentioned earlier, the input tuples may either
be point tuples or interval tuples. To keep the forthcoming discussion generalized, we
assume that the input tuples have both start and end offsets. We now discuss the different
cases possible when we consider two input tuples, and the corresponding actions taken
in each case.
Merging strategy in NEAR: The inputs to the NEAR operator are two sets of tuples
corresponding to the left child and the right child, and an optional distance. Let the left
set be denoted by L and the right set by R . Let the distance be denoted by d . We arbitrar-
ily assign the first tuple from L as initiator , and the first tuple from R as terminator .
Let i s and i e denote the start offset and end offset of the initiator ,and t s and t e denote
the start offset and end offset of the terminator .Let i +1
be the next tuple from the set
which initiator belongs to, and t +1
be the next tuple from the set which terminator
belongs to.
If initiator and terminator do not belong to the same docID, we advance the
pointer which is pointing to a smaller docID. Since the sets are sorted by docID, this
is similar to a sort-merge operation. When initiator and terminator point to tuples
belonging to the same docID, three cases are possible.
Case 1 ( i e <t e ): This means that the assumed initiator ends before the assumed
terminator . The different possibilities are shown in Figure 3. We perform the follow-
ing sequence of actions:
if initiator and terminator overlap then
lookahead 2 to determine new initiator and terminator
go to the beginning of this operation and re-process the new
initiator
and
terminator
2
The Lookahead algorithm is explained below.
 
Search WWH ::




Custom Search