Database Reference
In-Depth Information
scanning for values within a range is just a sequential in-order traversal of
the tree.
Relational Database Optimization
Relational databases generally have primary and secondary indexes. Both
of these are B-Trees. The only difference is that the primary index also
generally stores the data within the B-Tree, which is a performance
optimization to save additional disk I/O. Secondary indexes are additional
B-trees that point to the data in the primary key.
In addition to the B-Tree, relational databases often store extra information
about the table that can help when planning the best way to run the query.
This information can be stored inside the B-Tree or in additional table
metadata. For example, a database often stores the number of rows in a
table and an approximation of the cardinality, or number of distinct values
in each field. This metadata can be useful to avoid reading unnecessary data
or deciding which index to use.
When a query is run, the query planner module makes an educated guess
about the fastest way to find the data needed in the query. For instance,
if there is a WHERE clause restricting the data on column foo , the query
planner first tries to find an index on column foo . If there is no suitable
index, the query has to do a full table scan, which can be extremely slow.
This means that all the data in the table has to be read, even if the WHERE
restriction picked out only a single row. This is why DBAs get paid a lot of
money; they look at which types of queries get run and make sure there are
indexes on the right columns.
Comparative Analysis
You may have noticed that a B-Tree in a relational database looks a lot
like the computational tree in BigQuery; that isn't a coincidence. BigQuery's
computation tree can be thought of as a B-tree that is computed on-the-fly.
Whereas in a relational database, a table scan is punishment for not having
an index, BigQuery always does the table scan and builds the index in
parallel. The benefit of BigQuery is that table scans are optimized and
ad-hoc queries just work, with no indexes or optimization required.
There are some things that a relational database does better. Lookups of
single rows from the primary key or secondary keys are extremely fast.
When data is changing, not just being appended to, a relational database
Search WWH ::




Custom Search