Databases Reference
In-Depth Information
Sphinx has some good geospatial search functions built in, which can
be a lot better than using MySQL. And in case you're thinking of using
MyISAM's GIS functions for the techniques shown in this section, take
our word for it: they don't work much better, and MyISAM itself just
doesn't work well for large, high-traffic applications, for all the usual
reasons: data corruption, table-level locking, and so on.
To recap this case study, we covered the usual optimization strategies:
• Stop doing things, or do them less often. Don't run your entire dataset through the
great-circle formula; trim it down first with a cheaper technique, and then run the
expensive formula on a smaller set of rows.
• Do things more quickly. Make sure you design the system to be able to use indexes
effectively, as discussed in the previous chapter, and use approximations (the earth
is flat, and a square is an approximation of a circle) sensibly to avoid needless
precision.
• Pull the work out into the application as much as you can. Get those expensive
trigonometry functions out of SQL and into the application code!
Using User-Defined Functions
Our last advanced query optimization illustrates when SQL just isn't the right tool for
the job. When you need raw speed, nothing beats C or C++ code. Of course, you have
to be able to program in C or C++ well enough not to destroy your server. With great
power comes great responsibility.
We'll show you how to write your own user-defined functions (UDFs) in the next
chapter, but we thought it would be a good idea to mention a real use case for a UDF
in this chapter. The project requirement from the customer was as follows: “We need
to run a matching query, which is basically an XOR operation between two random 64-
byte long data strings, against 35 million records in less than few seconds.” A little
calculation showed that this just can't be done inside MySQL with currently available
hardware. How to solve this problem?
The answer was a program that Yves Trudeau wrote, which takes advantage of the
SSE4.2 instruction set. It runs as a daemon on many commodity servers, and the
MySQL server communicates with it over a simple network protocol via a UDF written
in C. Yves benchmarked the distributed program running matches against 4 million
strings in 130 milliseconds. By taking the problem out of MySQL and making MySQL
talk to the distributed daemon, the customer was able to keep things simple for the
application, so that it can continue acting as if MySQL is doing all the work. As they
say on Twitter, #winning! This is an example of optimizing for the business, not just
for the technical aspects of the problem.
 
Search WWH ::




Custom Search