Database Reference
In-Depth Information
This code causes the first six lines of every file to be ignored.
allowJaggedRows
When encoding records with a lot of fields (columns) that are frequently
null-valued, some tools choose to leave out trailing fields that are null. When
reading this data all columns after the last column present in a row must be
treated as null or absent. Making this data conform to the requirements of
the basic CSV format would mean padding each row with trailing commas
(the field delimiter) to represent the null columns. Again BigQuery has a
feature that can handle this data.
load['allowJaggedRows'] = True
In this mode BigQuery will accept a row with fewer columns than the
number of fields in the schema as long as all the fields in the schema that are
missing are marked as NULLABLE . Note that any null column that appears
before a non-null column must still be explicitly encoded as a blank field in
the row.
allowQuotedNewlines
This option deserves careful explanation because it affects the one aspect
of CSV parsing in which BigQuery's default behavior differs from the
specification of the format. The CSV format enables the newline character
to appear within quoted fields. This is necessary to let the format encode
values that contain the line separator. However, it turns out that this feature
makes it impossible to safely process chunks of a CSV file in parallel. In
any chunk other than the first chunk, it is impossible to tell if a newline
occurs inside a quoted string or outside a quoted string. This means that
the file can be processed only from beginning to end by a single process
keeping track of whether it is in the middle of a quoted value. However, the
majority of CSV associated with data processing does not contain quoted
newlines, so it would be a shame if the default behavior were to use the slow,
but specification-compatible, serial processing strategy instead of the faster
parallel processing strategy. As a result, BigQuery defaults to behavior that
assumes that no quoted newlines are present in the input data, and it can
be safely divided up for parallel processing. If your data does have quoted
newlines, you can set the allowQuotedNewlines property:
Search WWH ::




Custom Search