Database Reference
In-Depth Information
Modifying the delimiter and quote is useful when fields contain sentences
and paragraphs that usually have punctuation. You still need to be careful
about the handling of newlines and carriage returns and may need to come
up with a scheme for escaping them or transforming them in your data.
encoding
You just got a taste for the complications that character encoding brings to
the table. If you work with UTF-8, you can basically ignore encoding because
UTF-8 is the encoding used natively by BigQuery and is the encoding used in
the HTTP-based API. If at all possible you want to stick with UTF-8 because
it avoids any difficulty associated with encoding conversions.
Note that even if you use UTF-8 for the values of the field, the lines of data
will not be valid UTF-8 data if you use a field delimiter in the range 128-255.
When BigQuery parses your data; it first splits the data into rows based on
the record delimiter (limited to “ \n ,” “ \r ,” and “ \r\n ”). Then it splits rows
into fields based on the customizable field delimiter and then checks the
encoding of each individual field. The only alternative encoding supported
is ISO-8859-1, which is a superset of Latin1. To request that your values be
treated as Latin1 strings and converted to UTF-8, use the following setting:
load['encoding'] = 'ISO-8859-1'
It is also legal to set this field to UTF-8, but that has no effect because
it is the default encoding. If you set the input encoding to ISO-8859-1,
single-byte characters in the range 128-255 will be converted to the
corresponding multibyte UTF-8 characters.
skipLeadingRows
Many tools that produce CSV include one or more header rows describing
the fields present in the data. It is tedious to have to strip this header
because in practice it means regenerating the entire file to just remove
the first few lines. Instead you can set a parameter in the configuration to
indicate to the parser that it should ignore some number of lines at the start
of the file. If your configuration specified multiple source files (on GCS), the
lines at the start of each of the files will be skipped.
load['skipLeadingRows'] = 6
Search WWH ::




Custom Search