Database Reference
In-Depth Information
9
Conclusion
Approximate string matching is a fundamental operation in text data
management. This work presented algorithms for selection and join
queries using set based and edit based similarity measures and arbitrary
token weighing schemes. There exists a large body of work in text data
management related to approximate string processing that was not cov-
ered here. An important aspect of approximate string matching is to
eciently incorporate domain knowledge into the search mechanism.
Domain knowledge can significantly affect search results and improve
query relevance in a variety of settings. Specifically, the context of a
query is of key importance in the similarity between strings. For exam-
ple, previous work has recognized the significance of synonyms in many
text search applications [4, 5]. In most cases, synonyms are context
dependent. For example, 'Avenue of the Americas' and '6th Avenue'
are synonyms only in the context of New York City. Other types of
domain knowledge can help index and query the data more eciently.
For example, the inherent hierarchical structure of mailing addresses
is an important factor for indexing such data. Given that the vast
majority of address based queries focus on particular cities or states, a
simple partitioning of the data (with possible spatial overlapping) on
395
 
Search WWH ::




Custom Search