Understanding Query Execution - Google BigQuery Analytics

Database Reference

In-Depth Information

Dremel supports an alternative query mode called materialize where the

shards write out the results directly and in parallel. This, of course, means

that the entire query must be parallelizable. Top-level ORDER BY operations

may be slow (because a sort must be computed globally), and queries may

end up taking substantially longer to complete because writing data to

permanent storage is slower than returning it over the network. Despite

these limitations, materialize queries are extremely powerful—they can

write out hundreds of gigabytes in only a few seconds.

For most queries, however, the user who issued them cares only about the

first few results. In the BigQuery Web UI, for example, few people fetch

the second page of results, let alone the last one. If you run a SELECT *

query to look at the first few rows of a table and you write out 10 TB and

wait 5 minutes, you'll be wasting a lot of your time and Google's processing

resources.

To prevent you from accidentally generating massive results, materialize

queries are not the default. If you want a massive result set, you need to set

the flag allowLargeResults in the query and specify a destination table.

The destination table is required because of how large results tend to be

used; because they are too large to be processed by a human, they are usually

going to be processed by a machine. Often you want to run another query

against your large result table, or maybe you want to export the results to

Google Cloud Storage so that you can process them on an external system.

In these cases, it is useful to have a table with a name to refer to as the source

of those subsequent operations.

Finally, writing out large results is expensive. Although BigQuery doesn't

charge you for normal query results, it does charge you for the storage used

for anything with a destination table. If you write out a 1 TB table, you have

pay the cost of storing that table.

Architecture Comparisons

To understand BigQuery, it can be helpful to put it in context with other

architectures you may already be familiar with. If you're coming from a

background where you've used a lot of relational databases, or you use

Hadoop to process your data, you might have certain assumptions about

what is going to be fast, what is going to be slow, or what is going to be

impossible. BigQuery's architecture creates a different set of things that are

Search WWH ::

Custom Search

Home