Query Performance Optimization - High Performance MySQL

Databases Reference

In-Depth Information

Sphinx has some good geospatial search functions built in, which can

be a lot better than using MySQL. And in case you're thinking of using

MyISAM's GIS functions for the techniques shown in this section, take

our word for it: they don't work much better, and MyISAM itself just

doesn't work well for large, high-traffic applications, for all the usual

reasons: data corruption, table-level locking, and so on.

To recap this case study, we covered the usual optimization strategies:

• Stop doing things, or do them less often. Don't run your entire dataset through the

great-circle formula; trim it down first with a cheaper technique, and then run the

expensive formula on a smaller set of rows.

• Do things more quickly. Make sure you design the system to be able to use indexes

effectively, as discussed in the previous chapter, and use approximations (the earth

is flat, and a square is an approximation of a circle) sensibly to avoid needless

precision.

• Pull the work out into the application as much as you can. Get those expensive

trigonometry functions out of SQL and into the application code!

Using User-Defined Functions

Our last advanced query optimization illustrates when SQL just isn't the right tool for

the job. When you need raw speed, nothing beats C or C++ code. Of course, you have

to be able to program in C or C++ well enough not to destroy your server. With great

power comes great responsibility.

We'll show you how to write your own user-defined functions (UDFs) in the next

chapter, but we thought it would be a good idea to mention a real use case for a UDF

in this chapter. The project requirement from the customer was as follows: “We need

to run a matching query, which is basically an XOR operation between two random 64-

byte long data strings, against 35 million records in less than few seconds.” A little

calculation showed that this just can't be done inside MySQL with currently available

hardware. How to solve this problem?

The answer was a program that Yves Trudeau wrote, which takes advantage of the

SSE4.2 instruction set. It runs as a daemon on many commodity servers, and the

MySQL server communicates with it over a simple network protocol via a UDF written

in C. Yves benchmarked the distributed program running matches against 4 million

strings in 130 milliseconds. By taking the problem out of MySQL and making MySQL

talk to the distributed daemon, the customer was able to keep things simple for the

application, so that it can continue acting as if MySQL is doing all the work. As they

say on Twitter, #winning! This is an example of optimizing for the business, not just

for the technical aspects of the problem.

Search WWH ::

Custom Search

Home