Using Sphinx with MySQL - High Performance MySQL

Databases Reference

In-Depth Information

incremental updates. The normal way to update the index data is to rebuild it from

scratch. This is not as big a problem as it might seem, though, for the following reasons:

• Indexing is fast. Sphinx can index plain text (without HTML markup) at a rate of

4-8 MB/sec on modern hardware.

• You can partition the data in several indexes, as shown in the next section, and

reindex only the updated part from scratch on each run of indexer .

• There is no need to “defragment” the indexes—they are built for optimal I/O,

which improves search speed.

• Numeric attributes can be updated without a complete rebuild.

A future version will offer an additional index backend, which will support real-time

index updates.

Typical Partition Use

Let's discuss partitioning in a bit more detail. The simplest partitioning scheme is the

main + delta approach, in which two indexes are created to index one document col-

lection. main indexes the whole document set, while delta indexes only documents that

have changed since the last time the main index was built.

This scheme matches many data modification patterns perfectly. Forums, blogs, email

and news archives, and vertical search engines are all good examples. Most of the data

in those repositories never changes once it is entered, and only a tiny fraction of docu-

ments are changed or added on a regular basis. This means the delta index is small and

can be rebuilt as frequently as required (e.g., once every 1-15 minutes). This is equiv-

alent to indexing just the newly inserted rows.

You don't need to rebuild the indexes to change attributes associated with

documents—you can do this online via searchd . You can mark rows as deleted by

simply setting a “deleted” attribute in the main index. Thus, you can handle updates

by marking this attribute on documents in the main index, then rebuilding the delta

index. Searching for all documents that are not marked as “deleted” will return the

correct results.

Note that the indexed data can come from the results of any SELECT statement; it doesn't

have to come from just a single SQL table. There are no restrictions on the SELECT

statements. That means you can preprocess the results in the database before they're

indexed. Common preprocessing examples include joins with other tables, creating

additional fields on the fly, excluding some fields from indexing, and manipulating

values.

Search WWH ::

Custom Search

Home