Databases Reference
In-Depth Information
Even when you're using SphinxSE, the rule of thumb still is to allow searchd to perform
sorting, filtering, and grouping—i.e., to add all the required clauses to the query string
rather than use WHERE , ORDER BY , or GROUP BY . This is especially important for WHERE
conditions. The reason is that SphinxSE is only a client to searchd , not a full-blown
built-in search library. Thus, you need to pass everything that you can to the Sphinx
engine to get the best performance.
Advanced Performance Control
Both indexing and searching operations could impose a significant additional load on
either the search server or the database server. Fortunately, a number of settings let you
limit the load coming from Sphinx.
An undesired database-side load can be caused by indexer queries that either stall
MySQL completely with their locks or just occur too quickly and hog resources from
other concurrent queries.
The first case is a notorious problem with MyISAM, where long-running reads lock the
tables and stall other pending reads and writes—you can't simply do SELECT * FROM
big_table on a production server, because you risk disrupting all other operations. To
work around that, Sphinx offers ranged queries . Instead of configuring a single huge
query, you can specify one query that quickly computes the indexable row ranges and
another query that pulls out the data step by step, in small chunks:
sql_query_range = SELECT MIN(id),MAX(id) FROM documents
sql_range_step = 1000
sql_query = SELECT id, title, body FROM documents \
WHERE id>=$start AND id<=$end
This feature is extremely helpful for indexing MyISAM tables, but it should also be
considered when using InnoDB tables. Although InnoDB won't just lock the table and
stall other queries when running a big SELECT * , it will still use significant machine
resources because of its MVCC architecture. Multiversioning for a thousand transac-
tions that cover a thousand rows each can be less expensive than a single long-running
million-row transaction.
The second cause of excessive load happens when indexer is able to process the data
more quickly than MySQL provides it. You should also use ranged queries in this case.
The sql_ranged_throttle option forces indexer to sleep for a given time period (in
milliseconds) between subsequent ranged query steps, increasing indexing time but
easing the load on MySQL.
Interestingly enough, there's a special case where you can configure Sphinx to achieve
exactly the opposite effect: that is, you can improve indexing time by placing more load
on MySQL. When the connection between the indexer box and the database box is 100
Mbps, and the rows compress well (which is typical for text data), the MySQL com-
pression protocol can improve overall indexing time. The improvement comes at a
cost of more CPU time spent on both the MySQL and indexer sides to compress and
 
Search WWH ::




Custom Search