Using Sphinx with MySQL - High Performance MySQL

Databases Reference

In-Depth Information

benefits as MySQL 5.1's partitioned tables, which can also partition data into multiple

locations. However, distributed indexes have some advantages over partitioned tables.

Sphinx uses distributed indexes both to distribute the load and to process all parts of

a query in parallel. In contrast, MySQL's partitioning can optimize some queries (but

not all) by pruning partitions, but the query processing will not be parallelized. And

even though both Sphinx and MySQL partitioning will improve query throughput, if

your queries are I/O-bound, you can expect linear latency improvement from Sphinx

on all queries, whereas MySQL's partitioning will improve latency only on those queries

where the optimizer can prune entire partitions.

The distributed searching workflow is straightforward:

1. Issue remote queries on all remote servers.

2. Perform sequential local index searches.

3. Read the partial search results from the remote servers.

4. Merge all the partial results into the final result set, and return it to the client.

If your hardware resources permit it, you can search through several indexes on the

same machine in parallel, too. If there are several physical disk drives and several CPU

cores, the concurrent searches can run without interfering with each other. You can

pretend that some of the indexes are remote and configure searchd to contact itself to

launch a parallel query on the same machine:

index distributed_sample

{

type = distributed

local = chunk1 # resides on HDD1

agent = localhost:3312:chunk2 # resides on HDD2, searchd contacts itself

}

From the client's point of view, distributed indexes are absolutely no different from

local indexes. This lets you create “trees” of distributed indexes by using nodes as

proxies for sets of other nodes. For example, the first-level node could proxy the queries

to a number of the second-level nodes, which could in turn either search locally them-

selves or pass the queries to other nodes, to an arbitrary depth.

Aggregating Sharded Data

Building a scalable system often involves sharding (partitioning) the data across differ-

ent physical MySQL servers. We discussed this in depth in Chapter 11 .

When the data is sharded at a fine level of granularity, simply fetching a few rows with

a selective WHERE (which should be fast) means contacting many servers, checking for

errors, and merging the results together in the application. Sphinx alleviates this prob-

lem, because all the necessary functionality is already implemented inside the search

daemon.

Search WWH ::

Custom Search

Home