Database Reference
In-Depth Information
The first query performs a filter and a sort:
SELECT word, corpus, word_count
FROM [publicdata:samples.shakespeare]
WHERE LENGTH(word) > 4 AND NOT REGEXP_MATCH(word,
"^[A-Z]+")
ORDER BY word_count DESC
LIMIT 5
This query computes the top five words in any Shakespeare play by
frequency, returning the word, the play in which it appears, and the word
count. It filters out any word less than four letters and anything that is all
caps (in order to ignore character names). If you run the query, you find that
the top value is “shall,” which appears in both Merry Wives of Windsor and
Henry VI, Part 2 119 times each.
Query Execution
When you run this query, Dremel performs the following steps:
1. The mixer receives the query. Its job is to parallelize the query so that it
can be sent to the shards for execution. The first thing that it does is
translate the query into a form that can be handled by the shards—in
many cases, this means simplifying the query. In this case, however, the
entire query is meaningful to the shard. The mixer then looks up the
table name and translates it into its underlying file names.
Each shard gets a subset of the files to operate over. The number of files
may limit the amount of parallelism possible in the query. If there are
1,000 shards available but only 10 files, only 10 shards will be involved
in the query. For small tables like the Shakespeare sample table, only a
few shards may be used. For other large tables, there may be more files
than shards, and each shard will be responsible for processing multiple
files. In general, BigQuery manages the file count to balance
performance with shard efficiency. (That is, too many files can be just as
bad as too few because you'll end up with shards that spend most of
their time opening files.)
2. The shards receive the customized query. They open the underlying files
that they were assigned and start reading. Because each field in
ColumnIO is stored in a separate file, this means opening one file per
Search WWH ::




Custom Search