Understanding Query Execution - Google BigQuery Analytics

Database Reference

In-Depth Information

If ColumnIO Compresses So Well, Why

Does BigQuery Charge per Byte Instead of

per Compressed Byte?

BigQuery charges for storage based on the number of bytes stored in

the table without accounting for compression. As previously described,

BigQuery stores data as compressed and relies on this compression for

performance. If your data in BigQuery is highly compressible, shouldn't

you be rewarded with smaller storage costs?

There are two reasons you're charged full-freight for your stored data.

The first is that just because the data compresses well doesn't mean

that it is much less difficult to query. For example, if you compare long

strings via a WHERE clause, the query engine has to read the whole

string to determine if there is a match, even if that string compressed

well on disk. And although BigQuery could compute one size for queries

and another for storage, that adds significant complexity without much

additional benefit.

The second rationale behind the pricing is just predictability. Say you

upload a column containing 1,000 integers today and get charged for

2,000 bytes because those integers compressed at a 4-to-1 compression

ratio. If then you upload another 1,000 integers tomorrow, you might

get a completely different compression ratio. Moreover, BigQuery

periodically reshuffles data as needed by the service. Sometimes, that

might change how the data is split between files, which can change the

compression ratio for better or worse. It would be highly surprising if

the storage costs for a table changed from day to day just because

BigQuery moved the underlying data.

The compromise that was chosen was to charge only for uncompressed

bytes, but to charge perhaps a lower rate than might have been

otherwise picked.

Search WWH ::

Custom Search

Home