Running Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

of your choosing. If you want to make a copy of the table, you can run a

Copy job. Both these options are available from the command line or

the web UI.

There are valid use cases in which you still want to use SELECT * . For

example, maybe you want to filter your table and you want to create a

smaller temporary table (as in SELECT * FROM . . . WHERE . .

. ). Using SELECT * in an inner query (as in SELECT foo FROM

(SELECT * from tablename) , which will read only the foo

column) is efficient because only the columns that are used in the outer

query actually need to be read.

The bottom line is this: If you use SELECT * in your query, you should

pause a moment and consider whether there is a better option.

Determining Query Cost

You can tell how much a query costs by reading the totalBytesProcesed

field in the response to Jobs.query() or in the statistics field in the

Job resource returned by Jobs.get() . Consider the following query, which

counts the number of active ZIP codes with nonzero populations in each

state:

>>> query = """

… SELECT state, COUNT(*) AS cnt

… FROM [bigquery-e2e:reference.zip_codes]

… WHERE population > 0 AND decommissioned = false

… GROUP BY state, ORDER BY cnt DESC

… """

This query reads three fields from the table

bigquery-e2e:reference.zip_codes : state , decommissioned ,

and population . Here is a Python command that runs the query and

returns only the number of bytes processed:

>>> service.jobs().query(

…

projectId=project_id ,

…

body={'query': query, 'useQueryCache': False}

Search WWH ::

Custom Search

Home