Databases Reference
In-Depth Information
quick in-memory lookup. Also, only the externally stored attributes can be updated at
runtime. Inline storage should be used only when there is not enough free RAM to hold
the attribute data.
Sphinx also supports multivalued attributes (MVAs). MVA content consists of an ar-
bitrarily long list of integer values associated with each document. Examples of good
uses for MVAs are lists of tag IDs, product categories, and access control lists.
Filtering
Having access to attribute values in the full-text engine allows Sphinx to filter and reject
candidate matches as early as possible while searching. Technically, the filter check
occurs after verification that the document contains all the required keywords, but
before certain computationally intensive calculations (such as ranking) are done. Be-
cause of these optimizations, using Sphinx to combine full-text searching with filtering
and sorting can be 10 to 100 times faster than using Sphinx for searching and then
filtering results in MySQL.
Sphinx supports two types of filters, which are analogous to simple WHERE conditions
in SQL:
• An attribute value matches a specified range of values (analogous to a BETWEEN
clause, or numeric comparisons).
• An attribute value matches a specified set of values (analogous to an IN() list).
If the filters will have a fixed number of values (“set” filters instead of “range” filters),
and if such values are selective, it makes sense to replace the integer values with “fake
keywords” and index them as full-text content instead of attributes. This applies to
both normal numeric attributes and MVAs. We'll see some examples of how to do this
later.
Sphinx can also use filters to optimize full scans. Sphinx remembers minimum and
maximum attribute values for short continuous row blocks (128 rows, by default) and
can quickly throw away whole blocks based on filtering conditions. Rows are stored
in the order of ascending document IDs, so this optimization works best for columns
that are correlated with the ID. For instance, if you have a row-insertion timestamp
that grows along with the ID, a full scan with filtering on that timestamp will be very
fast.
The SphinxSE Pluggable Storage Engine
Full-text search results received from Sphinx almost always require additional work
involving MySQL—at the very least, to pull out the text column values that the Sphinx
index does not store. As a result, you'll frequently need to JOIN search results from
Sphinx with other MySQL tables.
 
Search WWH ::




Custom Search