Database Reference
In-Depth Information
Every node in a PDG has one or more parent nodes (also called as subscriber nodes),
except the root node. Leaf nodes propagate their input sets to their parent nodes. A
parent node, which corresponds to one of the operators such as OR or NEAR, thus gets
one or more sets of tuples as its input. The operator merges its input sets according
to the Proximal-Unique semantics for that operator to create an output set. After the
merged set is created, it is propagated to the parent node of the operator. This process of
propagating merged sets continues all the way up to the root. The merged output of the
root operator corresponds to the result set for the query. For example, in figure 2, the
input sets from leaf nodes are propagated to the “followed by” node, where the complex
pattern is detected over the interval < 10 , 12 > .
D1 <10, 12>
to parent
FOLLOWED
BY
D1 <10, 10>
Protein
D1 <12, 12>
Clustering
Fig. 2. PDG corresponding to “Protein” FOLLOWED BY “Clustering”
3
Pattern Operator Processing
InfoSearch computations are different from that of the algorithms used in a streaming
system. In a streaming system [5], the operators work by reading the data source se-
quentially, and passing simple pattern occurrences to the respective PDG nodes as and
when they occur while the data is being read. In other words, the input to a leaf node
will be a tuple and not a set of tuples. Because the data is read sequentially, simple
patterns are detected in their order of occurrence in the data source. As a re-
sult, at any operator, the initiator is always available when the terminator arrives. The
occurrences can then be combined and propagated, or discarded, as per the semantics
of the operator.
However, in InfoSearch the entire result set corresponding to a pattern is propagated
at once because of the stored text. This means that the relative order of occurrence of the
operands is lost, because each operand is a set containing all occurrences of the pattern
corresponding to that operand in the document collection. Hence, to generate correct
results, the InfoSearch operators need to restore the order of occurrence of patterns as
in the original document. This is crucial in order to determine which operand is the
initiator and which one is the terminator. Only when the relative order of occurrence
 
Search WWH ::




Custom Search