Advanced Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

BigQuery implicitly generated a cartesian product for each independently

repeating pair, it would be easy to inadvertently construct computationally

expensive queries.

Just because an operation is expensive does not mean it isn't the right

thing to do. So BigQuery supports a FLATTEN operator that allows you to

explicitly specify that a cartesian product is required.

FLATTEN(< table value >, < field >)

This operation converts the specified repeated field to an optional field,

generating a new record for each value in the list and copying all other fields

through unchanged. Before you port your relational query to BigQuery using

this operator, consider a simpler example. Here is a query counting separate

fields in the table.

SELECT COUNT(ts) AS records, COUNT(running.name) AS

apps

FROM [bigquery-e2e:ch10.sample_data]

As discussed earlier the counts are different because repeated fields are not

quite handled like a join between two relational tables. If you insert the

FLATTEN operator into the query, you get a different result.

SELECT COUNT(ts) AS records, COUNT(running.name) AS

apps

FROM FLATTEN([bigquery-e2e:ch10.sample_data], running)

The two counts returned by this query will be identical and will match

the application count column in the previous query. This result exactly

matches what you would expect from a relational database, an indication

that FLATTEN multiplies the number or records like a JOIN .

Now you can get back to the query comparing wireless networks and

applications. You can choose to flatten either of the two repeated fields

involved. It is preferable to choose the field that has lower average

multiplicity, but in this case there is no obvious choice, so arbitrarily flatten

the wireless field.

SELECT

wireless.ssid AS ssid,

Search WWH ::

Custom Search

Home