Using Sphinx with MySQL - High Performance MySQL

Databases Reference

In-Depth Information

quick in-memory lookup. Also, only the externally stored attributes can be updated at

runtime. Inline storage should be used only when there is not enough free RAM to hold

the attribute data.

Sphinx also supports multivalued attributes (MVAs). MVA content consists of an ar-

bitrarily long list of integer values associated with each document. Examples of good

uses for MVAs are lists of tag IDs, product categories, and access control lists.

Filtering

Having access to attribute values in the full-text engine allows Sphinx to filter and reject

candidate matches as early as possible while searching. Technically, the filter check

occurs after verification that the document contains all the required keywords, but

before certain computationally intensive calculations (such as ranking) are done. Be-

cause of these optimizations, using Sphinx to combine full-text searching with filtering

and sorting can be 10 to 100 times faster than using Sphinx for searching and then

filtering results in MySQL.

Sphinx supports two types of filters, which are analogous to simple WHERE conditions

in SQL:

• An attribute value matches a specified range of values (analogous to a BETWEEN

clause, or numeric comparisons).

• An attribute value matches a specified set of values (analogous to an IN() list).

If the filters will have a fixed number of values (“set” filters instead of “range” filters),

and if such values are selective, it makes sense to replace the integer values with “fake

keywords” and index them as full-text content instead of attributes. This applies to

both normal numeric attributes and MVAs. We'll see some examples of how to do this

later.

Sphinx can also use filters to optimize full scans. Sphinx remembers minimum and

maximum attribute values for short continuous row blocks (128 rows, by default) and

can quickly throw away whole blocks based on filtering conditions. Rows are stored

in the order of ascending document IDs, so this optimization works best for columns

that are correlated with the ID. For instance, if you have a row-insertion timestamp

that grows along with the ID, a full scan with filtering on that timestamp will be very

fast.

The SphinxSE Pluggable Storage Engine

Full-text search results received from Sphinx almost always require additional work

involving MySQL—at the very least, to pull out the text column values that the Sphinx

index does not store. As a result, you'll frequently need to JOIN search results from

Sphinx with other MySQL tables.

Search WWH ::

Custom Search

Home