Conclusion - Approximate String Processing

Database Reference

In-Depth Information

9

Conclusion

Approximate string matching is a fundamental operation in text data

management. This work presented algorithms for selection and join

queries using set based and edit based similarity measures and arbitrary

token weighing schemes. There exists a large body of work in text data

management related to approximate string processing that was not cov-

ered here. An important aspect of approximate string matching is to

eciently incorporate domain knowledge into the search mechanism.

Domain knowledge can significantly affect search results and improve

query relevance in a variety of settings. Specifically, the context of a

query is of key importance in the similarity between strings. For exam-

ple, previous work has recognized the significance of synonyms in many

text search applications [4, 5]. In most cases, synonyms are context

dependent. For example, 'Avenue of the Americas' and '6th Avenue'

are synonyms only in the context of New York City. Other types of

domain knowledge can help index and query the data more eciently.

For example, the inherent hierarchical structure of mailing addresses

is an important factor for indexing such data. Given that the vast

majority of address based queries focus on particular cities or states, a

simple partitioning of the data (with possible spatial overlapping) on

395

Search WWH ::

Custom Search

Home