Databases Reference
In-Depth Information
When you use an in-node search system, the reverse indexes are located on the same
node as the data. This allows you to send a query to each node and have it respond with
the search results without having to do any additional input/output to include search
information. If you retain document structure, you can also use structural match rules
to change the query results based on where in a document a match occurs.
In contrast, a search service sends documents to an external search cluster when
they need to be indexed. This is usually done by a collection trigger that's fired when
any document is added or updated in the database. Even if a single word within a doc-
ument is altered, the entire document is sent to the remote service and re-indexed.
When a search is performed, the keywords in the search are sent to the remote system
and all document IDs that match the search are returned. Note that the actual docu-
ments aren't returned. Only a list of the document ID s and their ranking score are
returned. Apache Solr and ElasticSearch are both examples of software that can be
configured as a remote search service.
Let's look at the various trade-offs of these two approaches.
Advantages of in-node index architecture:
Lower network usage; documents aren't sent between clusters, resulting in
higher performance
Ideal for large documents that have many small and frequent changes
Better fine-grained control search results on structured documents
Advantages of remote service architecture:
Ability to take advantage of prebuilt and pretested components for standard
functions such as creating and maintaining full-text search indexes
Easier to upgrade to new features of remote search services
Ideal for documents that are added once without frequent updates
These are high-level guidelines, and each NoSQL system or version might have excep-
tions to these rules. You can see that how often you update documents has an impact
on what architecture is right for you. Whatever architecture you select, we recom-
mend that you take the time to test a configuration that closely matches your business
challenge.
Our next section will take a look at one way to speed up the initial document
indexing process, and the creation of reverse indexes to support full-text search.
7.7
Case study: using MapReduce
to create reverse indexes
One of the most time-consuming parts of building any search system is creating the
reverse indexes for new full-text documents as they're imported into your NoSQL
database. A typical 20-page document with 5,000 words can result in 5,000 additions to
your reverse index. Indexing 1,000 documents into your collection would require
approximately five million index updates. Spreading the load of this process over mul-
tiple servers is the best way to index large document collections.
Search WWH ::




Custom Search