Understanding Query Execution - Google BigQuery Analytics

Database Reference

In-Depth Information

the sum of a few values (one value per shard). Performing more work in the

shards means that the query can scale out efficiently.

This mechanism just described—shards reading the raw table data and

performing filters and partial aggregation, and then passing results up to

a mixer that does further aggregation—is the basis of how Dremel works.

Each aggregation method, from COUNT() to SUM() to STDDEV() , and so

on, can be partitioned into parallel operations that can be combined into a

final result or results. Operations like GROUP BY , as well, can be partially

performed in the shards and subsequently combined in the mixer.

In a typical Dremel tree, there are hundreds or thousands of shards. Because

aggregating the results from all these shards is more work than a single

mixer could handle, the mixers are arranged in a tree. At the root of the tree

is the root mixer, which is responsible for responding back to the caller with

the result data. Figure 9.2 shows a Dremel tree with two levels of mixers.

Figure 9.2 Dremel serving tree

Basic Queries

To help you understand how Dremel works, this section looks at a few

queries and walks you through how they are executed. For clarity in these

examples, we only describe the operations of a single mixer. When there

are multiple levels of mixers, they just repeat the same operations, so if you

understand how one works, you understand how a tree of them works. These

queries all use the publicdata:samples.shakespeare table which

contains counts for each word used in each Shakespeare play.

Search WWH ::

Custom Search

Home