Running Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

… ).execute()['totalBytesProcessed']

u'552786'

With the Tables.get() method, you can see that the table has 42,522

rows:

>>> service.tables().get(

… projectId=project_id ,

… datasetId='reference' ,

… tableId='zip_codes').execute()['numRows']

u'42522'

If there were no STRING fields read by the query, the number of rows

would be enough to determine the number of bytes processed. However,

the state field has a variable length (it will be different in every row).

The number of bytes processed per row should be the length of the state

field plus 2 bytes, 8 bytes for the population field, and 1 byte for the

decommissioned field. The following query computes the cost by adding

up the field sizes:

>>> cost_query = """

… SELECT state_len + pop_len + decommissioned_len FROM

(

…

SELECT SUM(LENGTH(state) + 2) AS state_len ,

…

8 * COUNT(population) AS pop_len ,

…

COUNT(decommissioned) AS decommissioned_len

…

FROM [bigquery-e2e:reference.zip_codes])

… """

>>> service.jobs().query(

… projectId=project_id ,

… body={'query': cost_query}

… ).execute()['rows'][0]['f'][0]['v']

u'552786'

This query returns the same value as the previous

totalBytesProcessed : 552,876—or just over one-half a megabyte. Of

course, running a query to compute how much a query costs is not

particularly efficient because that query will cost you money, too. What if

you want to find out how much a query will cost before actually running it?

Search WWH ::

Custom Search

Home