Database Reference
In-Depth Information
BigQuery supports that can represent schemas with array and RECORD type
fields, so if you use those features in your tables, you have to use this format
to load data into your tables. The JSON format has a detailed specification
with fully defined lexical analysis rules that produce strongly typed values.
The most natural way to encode a list of records with JSON encoding would
be to use the JSON list type, which is basically a comma-delimited series
of JSON values enclosed in square brackets ([]). This encoding, like quoted
newlines in CSV, would make it impossible to process a large JSON encoded
file in parallel. To overcome this data processing, frameworks (for example,
Hive, http://hive.apache.org ) have introduced a variant of JSON that
is referred to here as newline delimited JSON. In regular JSON the newline
character ( \n ) is treated as whitespace and is allowed to appear anywhere
that whitespace is allowed. However, it is not allowed to appear in strings.
Instead it must be escaped in strings as ā€œ\nā€ . (This would be a JSON
string representing a single newline character.) Newline delimited JSON
takes advantage of this and promotes it to a record separating character
that must appear only at record boundaries. It is not permitted to appear
anywhere within a single record. The advantage of this format is that it
is trivial to detect record boundaries (simply scan for a newline) making
it convenient for parallel processing. This is the variant of JSON that is
accepted by BigQuery load jobs.
To illustrate how the list and record types are encoded, use a schema that
includes these features:
load_config['schema'] = {
'fields': [
{'name':'string_f', 'type':'STRING'},
{'name':'boolean_f', 'type':'BOOLEAN',
'mode':'REPEATED'},
{'name':'record_f', 'type':'RECORD',
'fields': [
{'name':'integer_f', 'type':'INTEGER'},
{'name':'float_f', 'type':'FLOAT'}
]
},
{'name':'timestamp_f', 'type':'TIMESTAMP'}
]
}
Search WWH ::




Custom Search