Database Reference
In-Depth Information
Background
A large portion of the effort that goes into database research and
engineering development deals with improving query performance. Much
of that work is predicated on the knowledge that some things are slow and
others are fast—reading from disk is slow, whereas reading memory is fast;
seeking to a new spot on the disk is slow, but a sequential scan is fast.
Processors can compute values quickly, but it is even better to precompute
values that you're going to need.
One of the cardinal sins of database design is the table scan, which is what
you resort to when the query optimizer gives up and can't figure out a fast
way to execute your query. A table scan does exactly what it sounds like; it
scans the entire table by reading it one row at a time. On many database
systems, not only are table scans slow, but also they risk slowing down other
operations because they keep the disk busy. A significant portion of the
complexity in a modern database system is designed to avoid table scans, at
all costs.
The designers of Dremel thought, “Why does a table scan have to be slow?”
“What would it take to make a table scan fast?” and “If we do make a table
scan fast, does that make other things easier?” They set a goal of performing
a table scan over a 1 TB table in less than 1 second.
Achieving a processing rate of a terabyte per second is tough. A standard
hard disk can read approximately 100 MB per second. (That is probably a
bit on the high side, but it is in the right ballpark.) If you have a hard disk
and want to read a terabyte from it, it is going to take you 10,000 seconds,
or approximately 3 hours.
Moreover, if you're going to do interesting queries, but don't have indexes,
you're going to need a lot of processing capacity. If your 1 TB table has
256 bytes per row, you can process 1 million rows per second per CPU. To
process the whole table in one second would take 4,000 CPUs.
The combination of innovative software design, scale-out architecture, and
Google's massive hardware infrastructure enabled the Dremel team to
achieve its goal of taming the table scan. The architecture sections in this
chapter describe how they did it. If you prefer to read the technical writeup,
a research paper introducing Dremel is available from the Google Research
website here: http://research.google.com/pubs/pub36632.html .
Search WWH ::




Custom Search