Database Reference
In-Depth Information
If you try to partition your table into more pieces than BigQuery has shards
for that table, you won't get an error, but you won't get an even balance. If
the table has only a single shard and you ask for partition 0 of 100, you will
likely get a partition that has all the data in the table; in this case partitions
1 through 99 would all be empty.
Like other decorator types, but unlike HASH partitioning, partition
decorators can be used anywhere that a table is read from in BigQuery.
This means you can use tabledata.list() to read from a table partition.
Chapter 12, “External Data Processing,” describes how this can be useful
when performing a MapReduce over the table. Alternatively, you can copy a
single partition or export a single partition. On the other hand, decorators
cannot be used to sample the results of a subquery, whereas HASH
partitioning can be applied to the results of subqueries.
Stable Partitioning with Snapshot Decorators
Whether you use HASH partitioning or partition decorators, you can run
into trouble if you try to run queries over several non-overlapping portions
of the table but the underlying table is changing. Say you're using HASH
partitioning to query the table in three different chunks and append the
results together:
-- 0
SELECT title, COUNT(*) FROM
[publicdata:samples.wikipedia]
WHERE ABS(HASH(title) % 3) == 0 GROUP BY title
-- 1
SELECT title, COUNT(*) FROM
[publicdata:samples.wikipedia]
WHERE ABS(HASH(title) % 3) == 1 GROUP BY title;
-- 2
SELECT title, COUNT(*) FROM
[publicdata:samples.wikipedia]
WHERE ABS(HASH(title) % 3) == 2 GROUP BY title;
What if the table changes in between the first and second queries? You're
going to end up with results that don't actually reflect the underlying table
at any particular point in time. The issue is even more severe with partition
Search WWH ::




Custom Search