Loading Data - Google BigQuery Analytics

Database Reference

In-Depth Information

In this case you can see that the job failed because you lack sufficient

permissions to create a table in the specified dataset.

To see what kind of data errors can be generated, there's an input file that is

riddled with errors. The following job configuration loads this file:

load_config['schema'] = {

'fields': [

{'name':'string_f', 'type':'STRING'},

{'name':'boolean_f', 'type':'BOOLEAN'},

{'name':'integer_f', 'type':'INTEGER',

'mode':'REQUIRED'},

{'name':'float_f', 'type':'FLOAT'},

{'name':'timestamp_f', 'type':'TIMESTAMP'}

]

}

load_config['sourceUris'] = [

'gs://bigquery-e2e/chapters/06/sample_bad.csv',

]

The example also changes the schema to include a field that is declared

a REQUIRED field to illustrate that an error is reported when no value is

supplied for a required field. When you run the sample that loads this data,

you can observe seven errors that look like this:

{

"reason": "invalid",

"message": "Could not interpret bool from string.",

"location": "File: 0 / " "Line:2 / Field:2"

}

The errors are reporting bad records that were rejected. The reason code is

always invalid and the message field describes the problem. In this case

you passed the value “nottrue” for a boolean field, which was not recognized.

These errors also contain a location entry describing where the error was

encountered. The location may refer to a specific field but in some cases—for

example, too many or too few columns—it just refers to the line or position

at which the error occurred. When the input file is processed in parallel, the

location will be reported as the byte offset of the start of the line rather than

Search WWH ::

Custom Search

Home