Database Reference
In-Depth Information
BigQuery implicitly generated a cartesian product for each independently
repeating pair, it would be easy to inadvertently construct computationally
expensive queries.
Just because an operation is expensive does not mean it isn't the right
thing to do. So BigQuery supports a FLATTEN operator that allows you to
explicitly specify that a cartesian product is required.
FLATTEN(< table value >, < field >)
This operation converts the specified repeated field to an optional field,
generating a new record for each value in the list and copying all other fields
through unchanged. Before you port your relational query to BigQuery using
this operator, consider a simpler example. Here is a query counting separate
fields in the table.
SELECT COUNT(ts) AS records, COUNT(running.name) AS
apps
FROM [bigquery-e2e:ch10.sample_data]
As discussed earlier the counts are different because repeated fields are not
quite handled like a join between two relational tables. If you insert the
FLATTEN operator into the query, you get a different result.
SELECT COUNT(ts) AS records, COUNT(running.name) AS
apps
FROM FLATTEN([bigquery-e2e:ch10.sample_data], running)
The two counts returned by this query will be identical and will match
the application count column in the previous query. This result exactly
matches what you would expect from a relational database, an indication
that FLATTEN multiplies the number or records like a JOIN .
Now you can get back to the query comparing wireless networks and
applications. You can choose to flatten either of the two repeated fields
involved. It is preferable to choose the field that has lower average
multiplicity, but in this case there is no obvious choice, so arbitrarily flatten
the wireless field.
SELECT
wireless.ssid AS ssid,
Search WWH ::




Custom Search