Advanced Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

If you try to partition your table into more pieces than BigQuery has shards

for that table, you won't get an error, but you won't get an even balance. If

the table has only a single shard and you ask for partition 0 of 100, you will

likely get a partition that has all the data in the table; in this case partitions

1 through 99 would all be empty.

Like other decorator types, but unlike HASH partitioning, partition

decorators can be used anywhere that a table is read from in BigQuery.

This means you can use tabledata.list() to read from a table partition.

Chapter 12, “External Data Processing,” describes how this can be useful

when performing a MapReduce over the table. Alternatively, you can copy a

single partition or export a single partition. On the other hand, decorators

cannot be used to sample the results of a subquery, whereas HASH

partitioning can be applied to the results of subqueries.

Stable Partitioning with Snapshot Decorators

Whether you use HASH partitioning or partition decorators, you can run

into trouble if you try to run queries over several non-overlapping portions

of the table but the underlying table is changing. Say you're using HASH

partitioning to query the table in three different chunks and append the

results together:

-- 0

SELECT title, COUNT(*) FROM

[publicdata:samples.wikipedia]

WHERE ABS(HASH(title) % 3) == 0 GROUP BY title

-- 1

SELECT title, COUNT(*) FROM

[publicdata:samples.wikipedia]

WHERE ABS(HASH(title) % 3) == 1 GROUP BY title;

-- 2

SELECT title, COUNT(*) FROM

[publicdata:samples.wikipedia]

WHERE ABS(HASH(title) % 3) == 2 GROUP BY title;

What if the table changes in between the first and second queries? You're

going to end up with results that don't actually reflect the underlying table

at any particular point in time. The issue is even more severe with partition

Search WWH ::

Custom Search

Home