Database Reference
In-Depth Information
SELECT * Considered Harmful
When BigQuery was initially released, it didn't support
SELECT *
. This
wasn't an oversight; it was intended to prevent users from
unintentionally incurring unnecessary charges. If you consider how
BigQuery works, you'll realize that
SELECT *
is going to tell BigQuery
to read every byte in the table as fast as possible; that is going to be an
expensive operation. If you have a terabyte of data in your table,
SELECT *
is going to cost you $5 (at current prices). Because
SELECT
*
doesn't actually change your data, you could have just read it via
TableData.list()
for free.
The initial assumption was that if you wanted to read all the columns in
the table, you would be willing to list all those columns to indicate that
you're serious about it. However, several customers complained, so
SELECT *
was added. That said, most of the time, there are better
ways to get the data you want than using
SELECT *
.
In the relational database world, it is common to run a
SELECT *
query to check out a table and try to understand the data in it. Maybe
one column is always
NULL
, or you want to find out whether the
State
column uses abbreviations. If you're just trying to get a feel for your
data, you can read the rows directly from the table (for free!) rather
than running a query. If you have access to the
bq
command-line client,
you can use the “
bq head tablename
” command. If you use the
BigQuery Web UI, you can select the table and click the Details button,
which shows you the first few rows. Both of these mechanisms use the
underlying
TableData.list()
API, which is free, instead of running
a query.
Sometimes people who run
SELECT *
actually do want to read all the
data in the table. Maybe they want to make a copy of the table or want
to download the table. If you use
SELECT *
for either of these reasons,
there are better, less expensive ways to accomplish your goals.
If you want to download the table, consider an Export job, which can
write out the table as CSV or JSON to a Google Cloud Storage location