Database Reference
In-Depth Information
Chapter 8
Advanced Queries
The chapters so far have covered most of the basic query mechanisms to find one or a series of documents by given
criteria. There are a number of mechanisms for finding given documents to bring them back to your application so
they can be processed. But sometimes these normal query mechanisms fall short and you want to perform complex
operations over most or all documents in your collection. Many developers, when queries or operations of this kind
are required, either iterate through all documents in the collection or write a series of queries to be executed in
sequence to perform the necessary calculations. Although this is a valid way of doing things, it can be burdensome to
write and maintain, as well as inefficient. It is for these reasons that MongoDB has some advanced query mechanics
that you can use to drive the most from your data. The advanced MongoDB features we'll examine in this chapter are
full-text search, the aggregation framework, and the MapReduce framework.
Full text search is one of the most-requested features to be added to MongoDB -. It represents the ability to
create specialized text indexes in MongoDB and then perform text searches on those indexes to locate documents
that contain matching text elements. The MongoDB full text search feature goes beyond simple string matching to
include a full-stemmed approach based on the language you have selected for your documents, and it is an incredibly
powerful tool for performing language queries on your documents. This recently introduced feature is marked as
“experimental” in the 2.4 releases of MongoDB, because the development team is still working hard to improve it,
which means you must manually activate it for use in your MongoDB environment.
The second feature this chapter will cover is the MongoDB aggregation framework. Introduced in chapters 4
and 6, this feature provides a whole host of query features that let you iterate over selected documents, or all of them,
gathering or manipulating information. These query functions are then arranged into a pipeline of operations which
are performed one after another on your collection to gather information from your queries.
The third and final feature we will cover is called MapReduce, which will sound familiar to those of you who have
worked with Hadoop. MapReduce is a powerful mechanism that makes use of MongoDB's built-in JavaScript engine
to perform abstract code executions in real time. It is an incredibly powerful tool that uses two JavaScript functions,
one to map your data and another to transform and pull information out from the mapped data.
Probably the most important thing to remember throughout this chapter is that these are truly advanced features,
and it is possible to cause serious performance problems for your MongoDB nodes if they are misused, so whenever
possible you should test any of these features in a testing environment before deploying them to important systems.
Text Search
MongoDB's text search works by first creating a full text index and specifying the fields that you wish to be indexed to
facilitate text searching. This text index will go over every document in your collection and tokenize and stem each
string of text. This process of tokenizing and stemming involves breaking down the text into tokens, which conceptually
are close to words. MongoDB then stems each token to find the root concept for the token. For example, suppose that
breaking down a string reaches the token fishing . This token is then stemmed back to the root word fish , so MongoDB
creates an index entry of fish for that document. This same process of tokenizing and stemming is applied to the search
parameters a user enters to perform a given text search. The parameters are then compared against each document,
and a relevance score is calculated. The documents are then returned to the user based on their score.
 
Search WWH ::




Custom Search