Databases Reference
In-Depth Information
Even when you're using SphinxSE, the rule of thumb still is to allow
searchd
to perform
sorting, filtering, and grouping—i.e., to add all the required clauses to the query string
rather than use
WHERE
,
ORDER BY
, or
GROUP BY
. This is especially important for
WHERE
conditions. The reason is that SphinxSE is only a client to
searchd
, not a full-blown
built-in search library. Thus, you need to pass everything that you can to the Sphinx
engine to get the best performance.
Advanced Performance Control
Both indexing and searching operations could impose a significant additional load on
either the search server or the database server. Fortunately, a number of settings let you
limit the load coming from Sphinx.
An undesired database-side load can be caused by
indexer
queries that either stall
MySQL completely with their locks or just occur too quickly and hog resources from
other concurrent queries.
The first case is a notorious problem with MyISAM, where long-running reads lock the
tables and stall other pending reads and writes—you can't simply do
SELECT * FROM
big_table
on a production server, because you risk disrupting all other operations. To
work around that, Sphinx offers
ranged queries
. Instead of configuring a single huge
query, you can specify one query that quickly computes the indexable row ranges and
another query that pulls out the data step by step, in small chunks:
sql_query_range = SELECT MIN(id),MAX(id) FROM documents
sql_range_step = 1000
sql_query = SELECT id, title, body FROM documents \
WHERE id>=$start AND id<=$end
This feature is extremely helpful for indexing MyISAM tables, but it should also be
considered when using InnoDB tables. Although InnoDB won't just lock the table and
stall other queries when running a big
SELECT *
, it will still use significant machine
resources because of its MVCC architecture. Multiversioning for a thousand transac-
tions that cover a thousand rows each can be less expensive than a single long-running
million-row transaction.
The second cause of excessive load happens when
indexer
is able to process the data
more quickly than MySQL provides it. You should also use ranged queries in this case.
The
sql_ranged_throttle
option forces
indexer
to sleep for a given time period (in
milliseconds) between subsequent ranged query steps, increasing indexing time but
easing the load on MySQL.
Interestingly enough, there's a special case where you can configure Sphinx to achieve
exactly the opposite effect: that is, you can improve indexing time by placing
more
load
on MySQL. When the connection between the
indexer
box and the database box is 100
Mbps, and the rows compress well (which is typical for text data), the MySQL com-
pression protocol can improve overall indexing time. The improvement comes at a
cost of more CPU time spent on both the MySQL and
indexer
sides to compress and