Using Sphinx with MySQL - High Performance MySQL

Databases Reference

In-Depth Information

41 {

42 printf ( "%d. <a href=details.php?id=%d>%s</a>, USD %.2f<br>\n",

43 $n++, $id, $match["sql"]["title"], $match["attrs"]["price"] );

44 }

45

46 ?>

Even though the snippet just shown is pretty simple, there are a few things worth

highlighting:

• The SetLimits() call tells Sphinx to fetch only the number of rows that the client

wants to display on a page. It's cheap to impose this limit in Sphinx (unlike in

MySQL's built-in search facility), and the number of results that would have been

returned without the limit are available in $result['total_found'] at no extra cost.

• Because Sphinx only indexes the title column and doesn't store it, we must fetch

that data from MySQL.

• We retrieve data from MySQL with a single combined query for the whole docu-

ment batch using the clause WHERE id IN (...) , instead of running one query for

each match (which would be inefficient).

• We inject the rows pulled from MySQL into our full-text search result set, to keep

the original sorting order. We explain this more in a moment.

• We display the rows using data pulled from both Sphinx and MySQL.

The row injection code, which is PHP-specific, deserves a more detailed explanation.

We couldn't simply iterate over the result set from the MySQL query, because the row

order can (and in most cases actually will) be different from that specified in the WHERE

id IN (...) clause. PHP hashes (associative arrays), however, keep the order in which

the matches were inserted into them, so iterating over $result["matches"] will produce

rows in the proper sorting order as returned by Sphinx. To keep the matches in the

proper order returned from Sphinx (rather than the semirandom order returned from

MySQL), therefore, we inject the MySQL query results one by one into the hash that

PHP stores from the Sphinx result set of matches.

There are also a few major implementation and performance differences between

MySQL and Sphinx when it comes to counting matches and applying a LIMIT clause.

First, LIMIT is cheap in Sphinx. Consider a LIMIT 500,10 clause. MySQL will retrieve

510 semirandom rows (which is slow) and throw away 500, whereas Sphinx will return

the IDs that you will use to retrieve the 10 rows you actually need from MySQL. Second,

Sphinx will always return the exact number of matches it actually found in the result

set, no matter what's in the LIMIT clause. MySQL can't do this efficiently, although in

MySQL 5.6 it will have partial improvements for this limitation.

Search WWH ::

Custom Search

Home