Database Reference
In-Depth Information
existing contents of a table. This was something of a white lie. It is true that
they have no access to the data already present in a table. However, they do
depend on the existing schema of the table and as mentioned can modify the
existing schema by adding columns. In this respect isolation is relevant and
load jobs are isolated from other load jobs because they capture a copy of the
destination table's schema at the time they are created and do not observe
changes to the schema that may occur while they are running. This can lead
to the job eventually failing because the schema of the table changes while it
is in progress.
If all this seems too complicated, follow this simple rule of thumb: Do not
modify the schema of an existing table by specifying a schema on a job.
Instead modify the schema by updating the table, and do this only when
there are no pending or running jobs. However, if you are interested in how
this unpredictable behavior can arise, here is a concrete example. Assume
that the table being updated has columns A and B. Load job 1 adds column
C, and load job 2 adds columns C and D. Both jobs can be legally started
at the same time because they are both performing compatible (with the
existing table) schema updates. If job 1 completes before job 2, both jobs
can succeed because the schema update of job 2 is adding a column (D)
in addition to the column (C) added by job 1. However, if job 2 completes
before job 1, job 1 will fail because the schema it specified removes a column
(D) with respect to the schema specified in job 2. Because the completion
order of concurrently executing jobs is not guaranteed, this implies that job
1 may fail randomly, which is unwanted. Also after job 2 succeeds it will be
impossible to retry job 1.
In addition to specifying where the load job should put the data, you can also
tune how the load job behaves if the existing destination table does or does
not exist, and what it does with any data that might already be present in the
destination table. This behavior is tuned by controlling the create and write
dispositions of the job. These properties were briefly described in Chapter
4 in the overview of the Jobs collection. You can control the dispositions by
modifying the job configuration:
load['createDisposition'] = 'CREATE_IF_NEEDED'
load['writeDisposition'] = 'WRITE_TRUNCATE'
Table 6.2 shows the supported dispositions and how they affect the
operation of the job:
 
Search WWH ::




Custom Search