Finding information with NoSQL search - Making Sense of NoSQL

Databases Reference

In-Depth Information

When you use an in-node search system, the reverse indexes are located on the same

node as the data. This allows you to send a query to each node and have it respond with

the search results without having to do any additional input/output to include search

information. If you retain document structure, you can also use structural match rules

to change the query results based on where in a document a match occurs.

In contrast, a search service sends documents to an external search cluster when

they need to be indexed. This is usually done by a collection trigger that's fired when

any document is added or updated in the database. Even if a single word within a doc-

ument is altered, the entire document is sent to the remote service and re-indexed.

When a search is performed, the keywords in the search are sent to the remote system

and all document IDs that match the search are returned. Note that the actual docu-

ments aren't returned. Only a list of the document ID s and their ranking score are

returned. Apache Solr and ElasticSearch are both examples of software that can be

configured as a remote search service.

Let's look at the various trade-offs of these two approaches.

Advantages of in-node index architecture:

 Lower network usage; documents aren't sent between clusters, resulting in

higher performance

 Ideal for large documents that have many small and frequent changes

 Better fine-grained control search results on structured documents

Advantages of remote service architecture:

 Ability to take advantage of prebuilt and pretested components for standard

functions such as creating and maintaining full-text search indexes

 Easier to upgrade to new features of remote search services

 Ideal for documents that are added once without frequent updates

These are high-level guidelines, and each NoSQL system or version might have excep-

tions to these rules. You can see that how often you update documents has an impact

on what architecture is right for you. Whatever architecture you select, we recom-

mend that you take the time to test a configuration that closely matches your business

challenge.

Our next section will take a look at one way to speed up the initial document

indexing process, and the creation of reverse indexes to support full-text search.

7.7

Case study: using MapReduce

to create reverse indexes

One of the most time-consuming parts of building any search system is creating the

reverse indexes for new full-text documents as they're imported into your NoSQL

database. A typical 20-page document with 5,000 words can result in 5,000 additions to

your reverse index. Indexing 1,000 documents into your collection would require

approximately five million index updates. Spreading the load of this process over mul-

tiple servers is the best way to index large document collections.

Search WWH ::

Custom Search

Home