Searching for Complex Patterns over Large Stored Information Repositories - Advances in Databases - page 80

Database Reference

In-Depth Information

Performance of Stream based detection vs. Index-based

detection

700

600

500

Info F ilter

InfoSearch

400

300

200

100

0

Query type

Fig. 7. Comparison of system performance over 2600 words

Fig. 8. Response Time of InfoSearch (IS) and InfoFilter (IF) Systems in milliseconds for each

Operator

have created a Java program called DocumentIndexer which takes a given folder of

documents, reads the documents, and builds an inverted index over those documents.

For every keyword in each document, it stores a “hit” in the inverted index, which

contains the path of the document the keyword is from, and the position of the keyword

in that document.

5

Experimental Results

The primary reason for developing operators and algorithms to detect complex patterns

over indexed data was to support efficient searching of stored documents for complex

patterns. It does not make sense to stream already stored documents and use InfoFil-

ter [5] for detecting patterns. Since InfoSearch uses an index-based approach, it is ex-

pected to be efficient for large volumes of data. A set of 20 documents of around 1.5 KB

each were selected from the Reuters-21578 dataset 3 and the documents were artificially

3

Available at http://www.daviddlewis.com/resources/testcollections/reuters21578

Next Page

Advances in Databases

Search WWH ::

Custom Search

Home