Databases Reference
In-Depth Information
Misspelled words —If a user misspells a keyword in a search form and the word the
user entered is a nondictionary word, the search engine might return a “Did you
mean” panel with spelling alternatives for the keyword. This feature requires
that the search engine be able to find words similar to the misspelled word.
Not all NoSQL databases support all of these features. But this list is a good starting
point if you're comparing two distinct NoSQL systems. Next we look at one type of
NoSQL database, the document store, that lends itself to high-quality search.
7.4
Using document structure to improve search quality
In chapter 4 we introduced the concept of document stores. You may recall that docu-
ment stores keep data elements together in a single object. Document stores don't
“shred” elements into rows within tables; they keep all information together in a sin-
gle hierarchical tree.
Document stores are popular for search because this retained structure can be
used to pinpoint exactly where in the document a keyword match is found. Using this
keyword match position information can make a big difference in finding a single
document in a large collection of documents.
If you retain the structure of the document, you can in effect treat each part of a
large document as though it were another document. You can then assign different
search result scores based on where in the document each keyword was found.
Figure 7.3 shows how document stores leverage a retained structure model to create
better search results.
Bag-of-words search
Retained structure search
Keywords
doc-id
Keywords
'Love'
Keywords
Keywords
'Hate'
'New'
Keywords
'Fear'
Keywords
￿ All keywords in a single container
￿ Only count frequencies are stored
with each word
￿ Keywords associated with each
subdocument component
Figure 7.3 Comparison of two types of document structures used in search. The
left panel is the bag-of-words search based on an extraction of all words in a
document without consideration of where words occur in the document
structure. The right panel shows a retained structure search that treats each
node in the document tree as a separate document. This allows keyword matches
in the title to have a higher rank than keywords in the body of a document.
 
Search WWH ::




Custom Search