Databases Reference
In-Depth Information
uncompress the rows transmitted over the network, respectively but the overall index-
ing time could be up to 20-30% less because of greatly reduced network traffic.
Search clusters can suffer from occasional overload, too, so Sphinx provides a few ways
to help avoid
searchd
going off on a spin.
First, a
max_children
option simply limits the total number of concurrently running
queries and tells clients to retry when that limit is reached.
Then there are query-level limits. You can specify that query processing stop either at
a given threshold of matches found or a given threshold of elapsed time, using the
SetLimits()
and
SetMaxQueryTime()
API calls, respectively. This is done on a per-query
basis, so you can ensure that more important queries always complete fully.
Finally, periodic
indexer
runs can cause bursts of additional I/O that will in turn cause
intermittent
searchd
slowdowns. To prevent that, options that limit
indexer
disk I/O
exist.
max_iops
enforces a minimal delay between I/O operations that ensures that no
more than
max_iops
disk operations per second will be performed. But even a single
operation could be too much; consider a 100 MB
read()
call as an example. The
max_iosize
option takes cares of that, guaranteeing that the length of every disk read
or write will be under a given boundary. Larger operations are automatically split into
smaller ones, and these smaller ones are then controlled by
max_iops
settings.
Practical Implementation Examples
Each of the features we've described can be found successfully deployed in production.
The following sections review several of these real-world Sphinx deployments, briefly
describing the sites and some implementation details.
Full-Text Searching on Mininova.org
A popular torrent search engine, Mininova (
http://www.mininova.org
)
provides a clear
example of how to optimize “just” full-text searching. Sphinx replaced several MySQL
replicas using MySQL built-in full-text indexes, which were unable to handle the load.
After the replacement, the search servers were underloaded; the current load average
is now in the 0.3-0.4 range.
Here are the database size and load numbers:
• The site has a small database, with about 300,000-500,000 records and about
300-500 MB of index.
• The site load is quite high: about 8-10 million searches per day at the time of this
writing.
The data mostly consists of user-supplied filenames, frequently without proper punc-
tuation. For this reason, prefix indexing is used instead of whole-word indexing. The