Database Reference
In-Depth Information
If ColumnIO Compresses So Well, Why
Does BigQuery Charge per Byte Instead of
per Compressed Byte?
BigQuery charges for storage based on the number of bytes stored in
the table without accounting for compression. As previously described,
BigQuery stores data as compressed and relies on this compression for
performance. If your data in BigQuery is highly compressible, shouldn't
you be rewarded with smaller storage costs?
There are two reasons you're charged full-freight for your stored data.
The first is that just because the data compresses well doesn't mean
that it is much less difficult to query. For example, if you compare long
strings via a WHERE clause, the query engine has to read the whole
string to determine if there is a match, even if that string compressed
well on disk. And although BigQuery could compute one size for queries
and another for storage, that adds significant complexity without much
additional benefit.
The second rationale behind the pricing is just predictability. Say you
upload a column containing 1,000 integers today and get charged for
2,000 bytes because those integers compressed at a 4-to-1 compression
ratio. If then you upload another 1,000 integers tomorrow, you might
get a completely different compression ratio. Moreover, BigQuery
periodically reshuffles data as needed by the service. Sometimes, that
might change how the data is split between files, which can change the
compression ratio for better or worse. It would be highly surprising if
the storage costs for a table changed from day to day just because
BigQuery moved the underlying data.
The compromise that was chosen was to charge only for uncompressed
bytes, but to charge perhaps a lower rate than might have been
otherwise picked.
Search WWH ::




Custom Search