Database Reference
In-Depth Information
Find movies with a particular word in the title
In most relational and document-based databases, querying for a single word within a string-
type field requires scanning, making this query much less efficient than the others mentioned
here.
One of the most-requested features of MongoDB is the so-called full-text index, which makes
queries such as this one more efficient. In a full-text index, the individual words (sometimes
even subwords) that occur in a field are indexed separately. In exciting and recent (as of the
writing of this section) news, development builds of MongoDB currently contain a basic full-
text search index, slated for inclusion in the next major release of MongoDB. Until MongoDB
full-textsearchindexshowsupinastableversionofMongoDB,however,thebestapproachis
probably deploying a separate full-text search engine (such as Apache Solr or ElasticSearch)
alongside MongoDB, if you're going to be doing a lot of text-based queries.
Althoughthereiscurrentlynoefficientfull-textsearchsupportwithinMongoDB,thereissup-
port for using regular expressions (regexes) with queries. In Python, we can pass a compiled
regex from the
re
module to the
find()
operation directly:
import
import
rre
re_hacker
=
re
.
compile
(
r'.*hacker.*'
,
re
.
IGNORECASE
)
query
=
db
.
products
.
find
({
'type'
:
'Film'
,
'title'
:
re_hacker
})
query
=
query
.
sort
([(
'details.issue_date'
,
-
1
)])
Although this query isn't particularly fast, there
is
a type of regex search that makes good use
of the indexes that MongoDB
does
support: the prefix regex. Explicitly matching the begin-
ningofthestring,followedbyafewprefixcharactersforthefieldyou'researchingfor,allows
MongoDB to use a “regular” index efficiently:
import
import
rre
re_prefix
=
re
.
compile
(
r'^A Few Good.*'
)
query
=
db
.
products
.
find
({
'type'
:
'Film'
,
'title'
:
re_prefix
})
query
=
query
.
sort
([(
'details.issue_date'
,
-
1
)])
In this query, since we've matched the
prefix
of the title, MongoDB can seek directly to the
titles we're interested in.