Advanced Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

SELECT src_id, dst_id,

POSITION(src) AS src_index, src,

POSITION(dst) AS dst_index, dst

FROM [bigquery-e2e:ch10.parallel]),

src)

WHERE src_index = dst_index

This works well enough, but keep in mind this is quite a bit more expensive

than the simpler queries possible with the more appropriate schema. The

execution engine has to generate the cross product of the elements in the

list before it can filter it down to matching the entries with the matching

positions. Just to demonstrate that you can actually do something useful

with the result of this query, here is a version that does a little more than

pulling the entries together.

SELECT src,

GROUP_CONCAT(CONCAT(dst_id, CONCAT(":", dst))) AS

dst_list

FROM FLATTEN((

SELECT

src_id, dst_id,

POSITION(src) AS src_index, src,

POSITION(dst) AS dst_index, dst

FROM [bigquery-e2e:ch10.parallel]),

src)

WHERE src_index = dst_index

AND src_id = "decimal"

GROUP BY src

ORDER BY src

This approach generalizes to more than two parallel lists. Each additional

repeated field requires a nested FLATTEN . You should also place the WHERE

clauses, applying the position equality filter in the nested queries, rather

than collecting them all at the top-level to reduce the number of

intermediate rows generated by the cross product. That means one such

condition following each FLATTEN invocation.

Search WWH ::

Custom Search

Home