Databases Reference
In-Depth Information
Special Features
Besides “just” indexing and searching through database content, Sphinx offers several
other special features. Here's a partial list of the most important ones:
• The searching and ranking algorithms take word positions and the query phrase's
proximity to the document content into account.
• You can bind numeric attributes to documents, including multivalued attributes.
• You can sort, filter, and group by attribute values.
• You can create document snippets with search query keyword highlighting.
• You can distribute searching across several machines.
• You can optimize queries that generate several result sets from the same data.
• You can access the search results from within MySQL using SphinxSE.
• You can fine-tune the load Sphinx imposes on the server.
We covered some of these features earlier. This section covers a few of the remaining
features.
Phrase Proximity Ranking
Sphinx remembers word positions within each document, as do other open source
full-text search systems. But unlike most other ones, it uses the positions to rank
matches and return more relevant results.
A number of factors might contribute to a document's final rank. To compute the rank,
most other systems use only keyword frequency: the number of times each keyword
occurs. The classic BM25 weighting function 1 that virtually all full-text search systems
use is built around giving more weight to words that either occur frequently in the
particular document being searched or occur rarely in the whole collection. The BM25
result is usually returned as the final rank value.
In contrast, Sphinx also computes query phrase proximity, which is simply the length
of the longest verbatim query subphrase contained in the document, counted in words.
For instance, the phrase “John Doe Jr” queried against a document with the text “John
Black, John White Jr, and Jane Dunne” will produce a phrase proximity of 1, because
no two words in the query appear together in the query order. The same query against
“Mr. John Doe Jr and friends” will yield a proximity of 3, because three query words
occur in the document in the query order. The document “John Gray, Jane Doe Jr” will
produce a proximity of 2, thanks to its “Doe Jr” query subphrase.
1. See http://en.wikipedia.org/wiki/Okapi_BM25 for details.
 
Search WWH ::




Custom Search