Loading Data - Google BigQuery Analytics

Database Reference

In-Depth Information

BigQuery tables support fields that are arrays and fields that are nested

within other fields. When a table contains such fields, it is not possible to

represent a record in the table as a simple list of values. The CSV family of

formats was designed to represent tabular data, so each record or line of

data is a simple list of values. As a result this input format is not compatible

with BigQuery schemas that contain fields that are arrays or have type

RECORD . If you are constrained to using CSV as an input format, you should

not employ a schema that includes these features.

To understand how CSV formatted data is turned into a BigQuery record,

consider the following concrete schema:

load_config['schema'] = {

'fields': [

{'name':'string_f', 'type':'STRING'},

{'name':'boolean_f', 'type':'BOOLEAN'},

{'name':'integer_f', 'type':'INTEGER'},

{'name':'float_f', 'type':'FLOAT'},

{'name':'timestamp_f', 'type':'TIMESTAMP'}

]

}

Because the default mode for a field is NULLABLE , all the fields in this

schema are optional. Now look at a couple of lines of CSV to see how they

can be transformed into records.

"one",true,1,1.0,2011-11-11 11:11:11

,,,,

"",false,,3.14e-1,1380378423

bare string ,"TRUE","0","0.000","2013-01-03

09:15:02.478 -05:00"

"quoted , and "" in a string",,,,

All the preceding lines import correctly into the table. The fields mapping

is a simple positional mapping, so the order of the fields in the schema

is significant, and the order of the values in the CSV data must line up.

The first line illustrates the basic formatting of values. An unquoted empty

string is interpreted as a missing or null value, so the second line generates

a record with null values for every field. In the fourth line you can see

that quoting is optional for strings that do not contain the field separating

Search WWH ::

Custom Search

Home