Databases Reference
In-Depth Information
benefits as MySQL 5.1's partitioned tables, which can also partition data into multiple
locations. However, distributed indexes have some advantages over partitioned tables.
Sphinx uses distributed indexes both to distribute the load and to process all parts of
a query in parallel. In contrast, MySQL's partitioning can optimize some queries (but
not all) by pruning partitions, but the query processing will not be parallelized. And
even though both Sphinx and MySQL partitioning will improve query throughput, if
your queries are I/O-bound, you can expect linear latency improvement from Sphinx
on all queries, whereas MySQL's partitioning will improve latency only on those queries
where the optimizer can prune entire partitions.
The distributed searching workflow is straightforward:
1. Issue remote queries on all remote servers.
2. Perform sequential local index searches.
3. Read the partial search results from the remote servers.
4. Merge all the partial results into the final result set, and return it to the client.
If your hardware resources permit it, you can search through several indexes on the
same machine in parallel, too. If there are several physical disk drives and several CPU
cores, the concurrent searches can run without interfering with each other. You can
pretend that some of the indexes are remote and configure searchd to contact itself to
launch a parallel query on the same machine:
index distributed_sample
{
type = distributed
local = chunk1 # resides on HDD1
agent = localhost:3312:chunk2 # resides on HDD2, searchd contacts itself
}
From the client's point of view, distributed indexes are absolutely no different from
local indexes. This lets you create “trees” of distributed indexes by using nodes as
proxies for sets of other nodes. For example, the first-level node could proxy the queries
to a number of the second-level nodes, which could in turn either search locally them-
selves or pass the queries to other nodes, to an arbitrary depth.
Aggregating Sharded Data
Building a scalable system often involves sharding (partitioning) the data across differ-
ent physical MySQL servers. We discussed this in depth in Chapter 11 .
When the data is sharded at a fine level of granularity, simply fetching a few rows with
a selective WHERE (which should be fast) means contacting many servers, checking for
errors, and merging the results together in the application. Sphinx alleviates this prob-
lem, because all the necessary functionality is already implemented inside the search
daemon.
 
Search WWH ::




Custom Search