Understanding Query Execution - Google BigQuery Analytics

Database Reference

In-Depth Information

scanning for values within a range is just a sequential in-order traversal of

the tree.

Relational Database Optimization

Relational databases generally have primary and secondary indexes. Both

of these are B-Trees. The only difference is that the primary index also

generally stores the data within the B-Tree, which is a performance

optimization to save additional disk I/O. Secondary indexes are additional

B-trees that point to the data in the primary key.

In addition to the B-Tree, relational databases often store extra information

about the table that can help when planning the best way to run the query.

This information can be stored inside the B-Tree or in additional table

metadata. For example, a database often stores the number of rows in a

table and an approximation of the cardinality, or number of distinct values

in each field. This metadata can be useful to avoid reading unnecessary data

or deciding which index to use.

When a query is run, the query planner module makes an educated guess

about the fastest way to find the data needed in the query. For instance,

if there is a WHERE clause restricting the data on column foo , the query

planner first tries to find an index on column foo . If there is no suitable

index, the query has to do a full table scan, which can be extremely slow.

This means that all the data in the table has to be read, even if the WHERE

restriction picked out only a single row. This is why DBAs get paid a lot of

money; they look at which types of queries get run and make sure there are

indexes on the right columns.

Comparative Analysis

You may have noticed that a B-Tree in a relational database looks a lot

like the computational tree in BigQuery; that isn't a coincidence. BigQuery's

computation tree can be thought of as a B-tree that is computed on-the-fly.

Whereas in a relational database, a table scan is punishment for not having

an index, BigQuery always does the table scan and builds the index in

parallel. The benefit of BigQuery is that table scans are optimized and

ad-hoc queries just work, with no indexes or optimization required.

There are some things that a relational database does better. Lookups of

single rows from the primary key or secondary keys are extremely fast.

When data is changing, not just being appended to, a relational database

Search WWH ::

Custom Search

Home