Database Reference
In-Depth Information
Query Result Size Limitations
When you run a normal query in BigQuery, the response size is limited to
128 MB of compressed data. Sometimes, it is hard to know what 128 MB
of compressed data means. Does it get compressed 2x? 10x? The results are
compressed within their respective columns, which means the compression
ratio tends to be very good. For example, if you have one column that is the
name of a country, there will likely be only a few different values. When you
have only a few distinct values, this means that there isn't a lot of unique
information, and the column will generally compress well. If you return
encrypted blobs of data, they will likely not compress well because they will
be mostly random.
Often it is easy to write a query that returns large results without intending
to; maybe you just ran a SELECT * query when you just wanted to see the
first few rows of data. Or maybe you just wanted to see the top few rows
from the query results. To prevent these types of queries from having to do a
lot more work writing out massive query results, BigQuery defaults to failing
when you run a query with massive results sets. If you don't need all the
results, it is easy to just add a LIMIT 1000 to the end of the query, and
you'll just get the first 1000 rows.
Although the rationale behind the query result size limitation is closely tied
to the BigQuery architecture, at a high level it is because the entire query
result must be returned from a single worker in the compute cluster. When
an operation must be done on a single worker, it means that it doesn't scale
out. For this reason, the size is limited.
Sometimes, however, you want to see results larger than 128 MB. You can
work around this limitation by setting the allowLargeResults flag on
the query. This causes each of the BigQuery workers to write their results
out individually. Because the query results can be written in parallel for
allowLargeResults queries, there are no limits to their size. The only
limit is that you must specify a destination table for the query, so you
will know where to refer to it afterward, and you can manage its lifetime.
Even though results are written in parallel, writing large results can be
significantly slower than writing out small ones.
Search WWH ::




Custom Search