Database Reference
In-Depth Information
Performance of Stream based detection vs. Index-based
detection
700
600
500
Info F ilter
InfoSearch
400
300
200
100
0
Query type
Fig. 7. Comparison of system performance over 2600 words
Fig. 8. Response Time of InfoSearch (IS) and InfoFilter (IF) Systems in milliseconds for each
Operator
have created a Java program called DocumentIndexer which takes a given folder of
documents, reads the documents, and builds an inverted index over those documents.
For every keyword in each document, it stores a “hit” in the inverted index, which
contains the path of the document the keyword is from, and the position of the keyword
in that document.
5
Experimental Results
The primary reason for developing operators and algorithms to detect complex patterns
over indexed data was to support efficient searching of stored documents for complex
patterns. It does not make sense to stream already stored documents and use InfoFil-
ter [5] for detecting patterns. Since InfoSearch uses an index-based approach, it is ex-
pected to be efficient for large volumes of data. A set of 20 documents of around 1.5 KB
each were selected from the Reuters-21578 dataset 3 and the documents were artificially
3
Available at http://www.daviddlewis.com/resources/testcollections/reuters21578
 
Search WWH ::




Custom Search