Using Sphinx with MySQL - High Performance MySQL

Databases Reference

In-Depth Information

modern hardware. In this case, the performance can be many times better than

MySQL's, though the results will be approximate.

The most important difference from MySQL's GROUP BY is that Sphinx may, under

certain circumstances, yield approximate results. There are two reasons for this:

• Grouping uses a fixed amount of memory. If there are too many groups to hold in

RAM and the matches are in a certain “unfortunate” order, per-group counts might

be smaller than the actual values.

• A distributed search sends only the aggregate results, not the matches themselves,

from node to node. If there are duplicate records in different nodes, per-group

distinct counts might be greater than the actual values, because the information

that can remove the duplicates is not transmitted between nodes.

In practice, it is often acceptable to have fast approximate group-by counts. If this isn't

acceptable, it's often possible to get exact results by configuring the daemon and client

application carefully.

You can generate the equivalent of COUNT(DISTINCT <attribute> ) , too. For example, you

can use this to compute the number of distinct sellers per category in an auction site.

Finally, Sphinx lets you choose criteria to select the single “best” document within each

group. For example, you can select the most relevant document from each domain,

while grouping by domain and sorting the result set by per-domain match counts. This

is not possible in MySQL without a complex query.

Generating Parallel Result Sets

Sphinx lets you generate several results from the same data simultaneously, again using

a fixed amount of memory. Compared to the traditional SQL approach of either run-

ning two queries (and hoping that some data stays in the cache between runs) or cre-

ating a temporary table for each search result set, this yields a noticeable improvement.

For example, assume you need per-day, per-week, and per-month reports over a period

of time. To generate these with MySQL you'd have to run three queries with different

GROUP BY clauses, processing the source data three times. Sphinx, however, can process

the underlying data once and accumulate all three reports in parallel.

Sphinx does this with a multi-query mechanism. Instead of issuing queries one by one,

you batch several queries and submit them in one request:

<?php

$cl = new SphinxClient ();

$cl->SetSortMode ( SPH_SORT_EXTENDED, "price desc" );

$cl->AddQuery ( "ipod" );

$cl->SetGroupBy ( "category_id", SPH_GROUPBY_ATTR, "@count desc" );

$cl->AddQuery ( "ipod" );

$cl->RunQueries ();

?>

Search WWH ::

Custom Search

Home