Advanced Queries - Google BigQuery Analytics

Database Reference

In-Depth Information

Parallel Lists

When you design schemas for BigQuery, it is best to organize repeated data

so that fields that repeat together are collected into a RECORD type field

that is repeated; this allows for simpler queries over the data. Unfortunately,

there are times when you have to deal with data that is structured as

independent lists where elements at equal indexes are related to each other.

For example, consider this table representing mappings between numbers

in different representations.

[

{"name": "src_id", "type": "string"},

{"name": "dst_id", "type": "string"},

{"name": "src", "type": "string", "mode":

"repeated"},

{"name": "dst", "type":"string", "mode": "repeated"}

]

Ideally, you would have preferred a schema that paired the source and

destination values into a single record.

[

{"name": "src_id", "type": "string"},

{"name": "dst_id", "type": "string"},

{"name": "map", "type": "record", "mode":

"repeated", "fields": [

{"name": "src", "type": "string"},

{"name": "dst", "type": "string"}

]}

]

Fortunately, if you are stuck with the former schema, you can still arrange

for the pairs to be constructed on-the-fly.

You can use the POSITION function to access the index of each value, but

you need to include a FLATTEN operation because the fields accessed are

independently repeating.

SELECT src_id, dst_id, src, dst

FROM FLATTEN((

Search WWH ::

Custom Search

Home