Database Reference
In-Depth Information
Client selected job IDs prevent the creation of duplicate instances of a job
due to retries in the client or communication layer. The reason for avoiding
duplication is easy to see in the context of WRITE_APPEND jobs. If you
insert two jobs that are configured to append the same data to a given
destination table, the table will end up with two copies of the data, which is
almost certainly not what you want. Of course, you would not intentionally
insert duplicate jobs, but it is easy for error handling code to inadvertently
generate duplicate requests.
When a Jobs.insert() request fails due to a network error, it is possible
that BigQuery actually performed the insert operation but the client failed
to receive the success response. Without a client selected job ID, the only
way to detect this condition is to list all the jobs and see if a job exists with
the configuration you were trying to submit. However, if you selected a job
ID before issuing the request, you could simply retry inserting the job with
the same ID. If BigQuery has already accepted the job, it responds with an
already exists error; otherwise, the retried insert succeeds. This way you can
guarantee that an append operation happens exactly once.
The bq command-line client supports explicitly specifying the job ID for any
operation that creates a job:
bq --job_id=<job id> load …
To use this correctly you need to select an ID that corresponds to the specific
data you are trying to load so that multiple instances of the same command
do not duplicate the job. For example, if the data were collected on a
particular day, you could use an ID of the form my_data_YYYYMMDD . The
client also supports a flag that computes the ID as a function of the job
configuration it constructs:
bq --fingerprint_job_id load …
This way you do not have to come up with an ID generation scheme but
retain the important property of not doing the same work more than once.
Because the ID is a hash of the configuration, its value will not be
meaningful, but if you do not need to look up specific jobs at a later time,
this is not an issue.
In some situations with multiple writers, it is not always convenient to
coordinate the writers to select a suitable job ID. If the operation they are
Search WWH ::




Custom Search