Databases Reference
In-Depth Information
Why have four searchd instances per node? Why not have only one searchd instance
per server, configure it to carry four index chunks, and make it contact itself as though
it's a remote server to utilize multiple CPUs, as we suggested earlier? Having four in-
stances instead of just one has its benefits. First, it reduces startup time. There are
several gigabytes of attribute data that need to be preloaded in RAM; starting several
daemons at a time lets us parallelize that. Second, it improves availability. In the event
of searchd failures or updates, only 1/24 of the whole index is inaccessible, instead
of 1/6.
Within each of the 24 instances on the search cluster, we used time-based partitioning
to reduce the load even further. Many queries need to be run only on the most recent
data, so the data is divided into three disjoint index sets: data from the last week, from
the last three months, and from all time. These indexes are distributed over several
different physical disks on a per-instance basis. This way, each instance has its own
CPU and physical disk drive and won't interfere with the others.
Local cron jobs update the indexes periodically. They pull the data from MySQL over
the network but create the index files locally.
Using several explicitly separated “raw” disks proved to be faster than a single RAID
volume. Raw disks give control over which files go on which physical disk. That is not
the case with RAID, where the controller decides which block goes on which physical
disk. Raw disks also guarantee fully parallel I/O on different index chunks, but con-
current searches on RAID are subject to I/O stepping. We chose RAID 0, which has no
redundancy, because we don't care about disk failures; we can easily rebuild the indexes
on the search nodes. We could also have used several RAID 1 (mirror) volumes to give
the same throughput as raw disks while improving reliability.
Another interesting thing to learn from BoardReader is how Sphinx version updates
are performed. Obviously, the whole cluster cannot be taken down. Therefore, back-
ward compatibility is critical. Fortunately, Sphinx provides it—newer searchd versions
usually can read older index files, and they are always able to communicate to older
clients over the network. Note that the first-tier nodes that aggregate the search results
look just like clients to the second-tier nodes, which do most of the actual searching.
Thus, the second-tier nodes are updated first, then the first-tier ones, and finally the
web frontend.
Lessons learned from this example are:
• The Very Large Database Motto: partition, partition, partition, parallelize.
• On big search farms, organize searchd in trees with several tiers.
• Build optimized indexes with a fraction of the total data where possible.
• Map files to disks explicitly rather than relying on the RAID controller.
 
Search WWH ::




Custom Search