Database Reference
In-Depth Information
REGULAR EXPRESSION PITFALLS
If you use the
re.IGNORECASE
flag, you're basically back where you were, since the in-
dexesarecreatedascase-sensitive.Ifyouwantcase-insensitivesearch,it'stypicallyagood
idea to store the data you want to search on in all-lowercase or all-uppercase format.
If for some reason you
don't
want to use a compiled regular expression, MongoDB provides
a special syntax for regular expression queries using plain Python
dict
objects:
query
=
db
.
products
.
find
({
'type'
:
'Film'
,
'title'
: {
'$regex'
:
'.*hacker.*'
,
'$options'
:
'i'
}})
query
=
query
.
sort
([(
'details.issue_date'
,
-
1
)])
The indexing strategy for these kinds of queries is different from previous attempts. Here, cre-
ate an index on
{ type: 1, details.issue_date: -1, title: 1 }
using the following
Python console:
>>>
db
.
products
.
ensure_index
([
...
(
'type'
,
1
),
...
(
'details.issue_date'
,
-
1
),
...
(
'title'
,
1
)])
This index makes it possible to avoid scanning whole documents by using the index for scan-
ning the title rather than forcing MongoDB to scan whole documents for the title field. Addi-
tionally, to support the sort on the
details.issue_date
field, by placing this field
before
the
title
field, ensures that the result set is already ordered before MongoDB filters title field.
Conclusion: Index all the things!
In ecommerce systems, we typically
don't
know exactly what the user will be filtering on, so
it's a good idea to create a number of indexes on queries that are likely to happen. Although
such indexing
will
slow down updates, a product catalog is only very infrequently updated,
so this drawback is justified by the significant improvements in search speed. To sum up, if
your application has a code path to execute a query, there should be an index to accelerate that
query.