Databases Reference
In-Depth Information
uncompress the rows transmitted over the network, respectively but the overall index-
ing time could be up to 20-30% less because of greatly reduced network traffic.
Search clusters can suffer from occasional overload, too, so Sphinx provides a few ways
to help avoid searchd going off on a spin.
First, a max_children option simply limits the total number of concurrently running
queries and tells clients to retry when that limit is reached.
Then there are query-level limits. You can specify that query processing stop either at
a given threshold of matches found or a given threshold of elapsed time, using the
SetLimits() and SetMaxQueryTime() API calls, respectively. This is done on a per-query
basis, so you can ensure that more important queries always complete fully.
Finally, periodic indexer runs can cause bursts of additional I/O that will in turn cause
intermittent searchd slowdowns. To prevent that, options that limit indexer disk I/O
exist. max_iops enforces a minimal delay between I/O operations that ensures that no
more than max_iops disk operations per second will be performed. But even a single
operation could be too much; consider a 100 MB read() call as an example. The
max_iosize option takes cares of that, guaranteeing that the length of every disk read
or write will be under a given boundary. Larger operations are automatically split into
smaller ones, and these smaller ones are then controlled by max_iops settings.
Practical Implementation Examples
Each of the features we've described can be found successfully deployed in production.
The following sections review several of these real-world Sphinx deployments, briefly
describing the sites and some implementation details.
Full-Text Searching on Mininova.org
A popular torrent search engine, Mininova ( http://www.mininova.org ) provides a clear
example of how to optimize “just” full-text searching. Sphinx replaced several MySQL
replicas using MySQL built-in full-text indexes, which were unable to handle the load.
After the replacement, the search servers were underloaded; the current load average
is now in the 0.3-0.4 range.
Here are the database size and load numbers:
• The site has a small database, with about 300,000-500,000 records and about
300-500 MB of index.
• The site load is quite high: about 8-10 million searches per day at the time of this
writing.
The data mostly consists of user-supplied filenames, frequently without proper punc-
tuation. For this reason, prefix indexing is used instead of whole-word indexing. The
 
Search WWH ::




Custom Search