Databases Reference
In-Depth Information
incremental updates. The normal way to update the index data is to rebuild it from
scratch. This is not as big a problem as it might seem, though, for the following reasons:
• Indexing is fast. Sphinx can index plain text (without HTML markup) at a rate of
4-8 MB/sec on modern hardware.
• You can partition the data in several indexes, as shown in the next section, and
reindex only the updated part from scratch on each run of indexer .
• There is no need to “defragment” the indexes—they are built for optimal I/O,
which improves search speed.
• Numeric attributes can be updated without a complete rebuild.
A future version will offer an additional index backend, which will support real-time
index updates.
Typical Partition Use
Let's discuss partitioning in a bit more detail. The simplest partitioning scheme is the
main + delta approach, in which two indexes are created to index one document col-
lection. main indexes the whole document set, while delta indexes only documents that
have changed since the last time the main index was built.
This scheme matches many data modification patterns perfectly. Forums, blogs, email
and news archives, and vertical search engines are all good examples. Most of the data
in those repositories never changes once it is entered, and only a tiny fraction of docu-
ments are changed or added on a regular basis. This means the delta index is small and
can be rebuilt as frequently as required (e.g., once every 1-15 minutes). This is equiv-
alent to indexing just the newly inserted rows.
You don't need to rebuild the indexes to change attributes associated with
documents—you can do this online via searchd . You can mark rows as deleted by
simply setting a “deleted” attribute in the main index. Thus, you can handle updates
by marking this attribute on documents in the main index, then rebuilding the delta
index. Searching for all documents that are not marked as “deleted” will return the
correct results.
Note that the indexed data can come from the results of any SELECT statement; it doesn't
have to come from just a single SQL table. There are no restrictions on the SELECT
statements. That means you can preprocess the results in the database before they're
indexed. Common preprocessing examples include joins with other tables, creating
additional fields on the fly, excluding some fields from indexing, and manipulating
values.
 
Search WWH ::




Custom Search