Database Reference
In-Depth Information
…
).execute()['totalBytesProcessed']
u'552786'
With the
Tables.get()
method, you can see that the table has 42,522
rows:
>>>
service.tables().get(
…
projectId=project_id
,
…
datasetId='reference'
,
…
tableId='zip_codes').execute()['numRows']
u'42522'
If there were no
STRING
fields read by the query, the number of rows
would be enough to determine the number of bytes processed. However,
the
state
field has a variable length (it will be different in every row).
The number of bytes processed per row should be the length of the
state
field plus 2 bytes, 8 bytes for the
population
field, and 1 byte for the
decommissioned
field. The following query computes the cost by adding
up the field sizes:
>>>
cost_query = """
…
SELECT state_len + pop_len + decommissioned_len FROM
(
…
SELECT SUM(LENGTH(state) + 2) AS state_len
,
…
8 * COUNT(population) AS pop_len
,
…
COUNT(decommissioned) AS decommissioned_len
…
FROM [bigquery-e2e:reference.zip_codes])
…
"""
>>>
service.jobs().query(
…
projectId=project_id
,
…
body={'query': cost_query}
…
).execute()['rows'][0]['f'][0]['v']
u'552786'
This query returns the same value as the previous
totalBytesProcessed
: 552,876—or just over one-half a megabyte. Of
course, running a query to compute how much a query costs is not
particularly efficient because that query will cost you money, too. What if
you want to find out how much a query will cost
before
actually running it?