Finding information with NoSQL search - Making Sense of NoSQL

Databases Reference

In-Depth Information

Let's assume you're searching for books on the topic of NoSQL. When you go to a

publisher's website and type NoSQL into a keyword search form, you'll get many

matches. The search system finds the keyword NoSQL in multiple places in each topic:

 In the title of a book or title of a chapter

 In a glossary term or back-of-topic index term

 In the body of the text of a topic

 In a bibliographic reference

As you can guess, if a topic has the keyword NoSQL in the title, there's a good chance

that the entire topic is about NoSQL. On the other hand, there may be related topics

that have a chapter on the topic of NoSQL and a larger set of topics that reference the

term NoSQL in the text or in a bibliographic reference. When the search system

returns the results to the user, it would make sense to give the matches to a topic title

the highest score and the matches to a chapter title the second-highest score. A match

in a glossary term or indexed word term might be next, followed by a match in body

text. The last results might be in a bibliographic reference.

The business rules for raising the search score based on where in a document the

word is found are called boosting . If you have no way to specify and find topic and chap-

ter titles within your documents, it'll be difficult to boost their ranking. Using a larger

font or a different font color won't help search tools find the right documents. This is

why using structured document formats such as DocBook can create higher-precision

search rankings than using the bag-of-words patterns.

You can see how easy it is to improve your search results by using a document's

original structure. As we move to our next section, you'll see how measuring search

quality will help you compare NoSQL options.

7.5

Measuring search quality

Accurately measuring search quality is an important process in selecting a NoSQL

database. From a quality perspective, you want your results to contain the search key

and accurately rank the results. To

measure search quality, you use two

metrics: precision and recall . As you'll

see, combining these metrics will

help you objectively measure the

quality of your search results.

An illustration of search quality is

shown in figure 7.4.

Your goal is to maximize both pre-

cision and recall. A metric called the

F-measure is roughly the mean of these

values and a larger F-measure indi-

cates higher search quality.

Actual

search results

Missed

document

Target documents