Understanding the BigQuery Object Model - Google BigQuery Analytics

Database Reference

In-Depth Information

$ bq --job_id=${JOB_ID} \

load scratch.table2 temp.csv

"f1:integer,f2:float,f3:string"

Waiting on bqjob_r11463cdf65f08230_00000140037462c6_1

… (36s) Current

status: DONE

$ bq --format=prettyjson show -j ${JOB_ID} | grep

outputBytes

"outputBytes": "21",

Here you can see that 21 bytes were loaded—that includes 8 bytes for each

of the two numeric fields, plus 3 bytes for “foo” plus 2 bytes for the

null-terminator.

Processing Costs

BigQuery charges you for the number of bytes scanned by a query. This is

roughly proportional to the amount of work that BigQuery does for each

query because all queries are essentially table scans—that is, they must read

all the rows in the table.

Each column in a table is stored separately, however, so BigQuery needs

to read only the columns that are directly referenced by the query. This

selectivity can make it somewhat difficult to know how much a given query

will cost, especially because the number of bytes per column is not exposed

to users, only the total number of bytes in the table.

Luckily, there is a mechanism to determine query cost (in bytes processed):

running the query in dry run mode. Dry run reports the amount of resources

that would be used by a job but does not actually run the job. This comes

in handy to figure out how much a query would cost. The command shown

here uses bq in dry run mode to find out how much it would cost (in bytes)

to query over the title field of the public Wikipedia table:

$ bq query --dry_run --format=prettyjson \

"select title from publicdata:samples.wikipedia" \

| grep totalBytesProcessed

"totalBytesProcessed": "7294285723"

The lazy (and perhaps spendthrift) way to determine the cost of the query

is just to run the query. As mentioned in a previous section, the

Search WWH ::

Custom Search

Home