Databases Reference
In-Depth Information
that exact phrase will turn up at the top, before songs that just contain “love” or
“dear” many times.
• It makes scaling out much easier.
Applying WHERE Clauses Efficiently
Sometimes you'll need to run SELECT queries against very large tables (containing mil-
lions of records), with several WHERE conditions on columns that have poor index se-
lectivity (i.e., return too many rows for a given WHERE condition) or could not be indexed
at all. Common examples include searching for users in a social network and searching
for items on an auction site. Typical search interfaces let the user apply WHERE conditions
to 10 or more columns, while requiring the results to be sorted by other columns. See
the indexing case study in Chapter 5 for an example of such an application and the
required indexing strategies.
With the proper schema and query optimizations, MySQL can work acceptably for
such queries, as long as the WHERE clauses don't contain too many columns. But as the
number of columns grows, the number of indexes required to support all possible
searches grows exponentially. Covering all the possible combinations for just four col-
umns strains MySQL's limits. It becomes very slow and expensive to maintain the
indexes, too. This means it's practically impossible to have all the required indexes for
many WHERE conditions, and you have to run the queries without indexes.
More importantly, even if you can add indexes, they won't give much benefit unless
they're selective. The classic example is a gender column, which isn't much help because
it typically selects around half of all rows. MySQL will generally revert to a full table
scan when the index isn't selective enough to help it.
Sphinx can perform such queries much faster than MySQL. You can build a Sphinx
index with only the required columns from the data. Sphinx then allows two types of
access to the data: an indexed search on a keyword or a full scan. In both cases, Sphinx
applies filters , which are its equivalent of a WHERE clause. Unlike MySQL, which decides
internally whether to use an index or a full scan, Sphinx lets you choose which access
method to use.
To use a full scan with filters, specify an empty string as the search query. To use an
indexed search, add pseudokeywords to your full-text fields while building the index
and then search for those keywords. For example, if you wanted to search for items in
category 123, you'd add a “category123” keyword to the document during indexing
and then perform a full-text search for “category123.” You can either add keywords to
one of the existing fields using the CONCAT() function, or create a special full-text field
for the pseudokeywords for more flexibility. Normally, you should use filters for non-
selective values that cover over 30% of the rows, and fake keywords for selective ones
that select 10% or less. If the values are in the 10-30% gray zone, your mileage may
vary, and you should use benchmarks to find the best solution.
 
Search WWH ::




Custom Search