Databases Reference
In-Depth Information
Let's assume you're searching for books on the topic of NoSQL. When you go to a
publisher's website and type NoSQL into a keyword search form, you'll get many
matches. The search system finds the keyword NoSQL in multiple places in each topic:
In the title of a book or title of a chapter
In a glossary term or back-of-topic index term
In the body of the text of a topic
In a bibliographic reference
As you can guess, if a topic has the keyword NoSQL in the title, there's a good chance
that the entire topic is about NoSQL. On the other hand, there may be related topics
that have a chapter on the topic of NoSQL and a larger set of topics that reference the
term NoSQL in the text or in a bibliographic reference. When the search system
returns the results to the user, it would make sense to give the matches to a topic title
the highest score and the matches to a chapter title the second-highest score. A match
in a glossary term or indexed word term might be next, followed by a match in body
text. The last results might be in a bibliographic reference.
The business rules for raising the search score based on where in a document the
word is found are called boosting . If you have no way to specify and find topic and chap-
ter titles within your documents, it'll be difficult to boost their ranking. Using a larger
font or a different font color won't help search tools find the right documents. This is
why using structured document formats such as DocBook can create higher-precision
search rankings than using the bag-of-words patterns.
You can see how easy it is to improve your search results by using a document's
original structure. As we move to our next section, you'll see how measuring search
quality will help you compare NoSQL options.
7.5
Measuring search quality
Accurately measuring search quality is an important process in selecting a NoSQL
database. From a quality perspective, you want your results to contain the search key
and accurately rank the results. To
measure search quality, you use two
metrics: precision and recall . As you'll
see, combining these metrics will
help you objectively measure the
quality of your search results.
An illustration of search quality is
shown in figure 7.4.
Your goal is to maximize both pre-
cision and recall. A metric called the
F-measure is roughly the mean of these
values and a larger F-measure indi-
cates higher search quality.
Actual
search results
Missed
document
Target documents
Other documents
Figure 7.4 Search precision and recall. Search
precision shows you the percent of target documents
that are returned in actual search results. Two of the
four documents in the actual search result are in the
target area for a precision of .5. Recall is the fraction
of all target documents (darker dots) found in your
actual search results. In this example, only two of the
three darker dots are in the actual search results, for a
recall of .66.
Search WWH ::




Custom Search