Loading Data - Google BigQuery Analytics

Database Reference

In-Depth Information

any project or dataset as long as the job creator (the identity used to insert

the job into the service) has write permissions on the dataset containing

the table. This was explained in Chapter 4 in the section on access controls.

If the job creator does not have suitable permissions, the job will still be

created but will end up in the done state and include an error indicating that

access was denied. Here is the code snippet specifying a destination for a

load job:

load_config = {

'destinationTable': {

'projectId': auth.PROJECT_ID,

'datasetId': 'ch06',

'tableId': 'example_basic'

}

load_config['schema'] = {

'fields': [

# …

]

}

The schema is optional if the specified destination table already exists and

has a schema. In this case the load job uses the schema on the table to

interpret the data uploaded. When there is no existing schema or the table

does not exist, the job must specify a schema. If the table has a schema and

the job specifies a schema, the schemas must be compatible. Compatibility

here means that every field present in the existing table schema must be

present with the same type in the job schema. This effectively implies that

new columns can be added to tables by load jobs, but columns can never be

removed. The previous snippet shows you where the schema is specified in

the job configuration.

Multiple load jobs running concurrently and attempting to modify the

schema of a table can fail in unpredictable ways. Frankly, if you are in this

situation, you probably should rethink your loading strategy. However, it

is worth understanding the nature of the problem. Before spelling out the

problematic issue, it is worth recalling the discussion of the ACID properties

of jobs. (A)tomic, (C)onsistent, and (D)urable were covered but we indicated

that (I)solated was not relevant because load jobs do not depend on the

Search WWH ::

Custom Search

Home