Databases Reference
In-Depth Information
Organizations that sell search services have dedicated quality teams that continu-
ally monitor and modify search ranking algorithms to get higher F-measures. They
develop processes that detect which search results users click on and look for ways to
automatically increase the ranking score of relevant items while lowering the ranking
score of items that users deem to be unrelated to their query.
Each search system has a way of changing the balance of precision and recall by
including a broader set of documents in search results. A search engine can look up
synonyms for a keyword and return all documents that contain both the keyword and
the synonyms. Adding more documents to the search results will lower precision num-
bers and increase recall numbers. It's important to strike a balance between precision
and recall percentages to fit into your system requirements.
Not all database selection projects will take the time to carefully measure precision
and recall of competing systems. Setting up a large collection of documents and mea-
suring relevancy of ranked search results can be time consuming and difficult to auto-
mate. But by retaining document structure, document stores have shown dramatic
gains in both precision and recall.
Now that we've covered the types of searches and how NoSQL systems speed up
these searches, we can compare how distributed systems use different strategies to
store indexes used in search optimization.
7.6
In-node indexes versus remote search services
There are two different ways that NoSQL systems store search indexes: in-node
indexes and using a remote search service. Most NoSQL systems keep their data and
indexes on the same node. But some NoSQL systems use external search services for
full-text search. These systems keep the full-text indexes on a remote cluster and use a
search API to generate search results. Since most NoSQL systems use one method or
another, understanding the trade-offs of each method will help you evaluate NoSQL
options. Figure 7.5 illustrates these two options.
In-node index
vs.
Remote search service
Data
Index
Data
Index
Data
Data
Index
Index
Data
Index
Data
Index
Full-text search
cluster
NoSQL cluster
NoSQL cluster
Figure 7.5 Integrated search vs. search services. The panel on the left shows how
NoSQL systems store the indexes on the same node as the indexed data. The panel on
the right shows a remote search service where indexes are stored on a remote cluster
that executes a search service through an API.
 
Search WWH ::




Custom Search