Databases Reference
In-Depth Information
modern hardware. In this case, the performance can be many times better than
MySQL's, though the results will be approximate.
The most important difference from MySQL's GROUP BY is that Sphinx may, under
certain circumstances, yield approximate results. There are two reasons for this:
• Grouping uses a fixed amount of memory. If there are too many groups to hold in
RAM and the matches are in a certain “unfortunate” order, per-group counts might
be smaller than the actual values.
• A distributed search sends only the aggregate results, not the matches themselves,
from node to node. If there are duplicate records in different nodes, per-group
distinct counts might be greater than the actual values, because the information
that can remove the duplicates is not transmitted between nodes.
In practice, it is often acceptable to have fast approximate group-by counts. If this isn't
acceptable, it's often possible to get exact results by configuring the daemon and client
application carefully.
You can generate the equivalent of COUNT(DISTINCT <attribute> ) , too. For example, you
can use this to compute the number of distinct sellers per category in an auction site.
Finally, Sphinx lets you choose criteria to select the single “best” document within each
group. For example, you can select the most relevant document from each domain,
while grouping by domain and sorting the result set by per-domain match counts. This
is not possible in MySQL without a complex query.
Generating Parallel Result Sets
Sphinx lets you generate several results from the same data simultaneously, again using
a fixed amount of memory. Compared to the traditional SQL approach of either run-
ning two queries (and hoping that some data stays in the cache between runs) or cre-
ating a temporary table for each search result set, this yields a noticeable improvement.
For example, assume you need per-day, per-week, and per-month reports over a period
of time. To generate these with MySQL you'd have to run three queries with different
GROUP BY clauses, processing the source data three times. Sphinx, however, can process
the underlying data once and accumulate all three reports in parallel.
Sphinx does this with a multi-query mechanism. Instead of issuing queries one by one,
you batch several queries and submit them in one request:
<?php
$cl = new SphinxClient ();
$cl->SetSortMode ( SPH_SORT_EXTENDED, "price desc" );
$cl->AddQuery ( "ipod" );
$cl->SetGroupBy ( "category_id", SPH_GROUPBY_ATTR, "@count desc" );
$cl->AddQuery ( "ipod" );
$cl->RunQueries ();
?>
 
Search WWH ::




Custom Search