Database Reference
In-Depth Information
$ bq cp ch11.time_lapse@1384381614244 ch11.recovered
Table '317752944021:ch11.time_lapse@1384381614244'
successfully \
copied to '317752944021:ch11.recovered'
$ bq head ch11.recovered
+-------+---------------+
| index | millis |
+-------+---------------+
| 5 | 1384381611449 |
| 2 | 1384381603871 |
| 4 | 1384381608691 |
| 3 | 1384381606536 |
| 1 | 1384381601244 |
+-------+---------------+
You can see that the recovered table is missing the last 5 rows added to the
original table. We picked the timestamp of the sixth row (1384381614244),
so the sixth row and all rows after it have been dropped. The main challenge
with using this technique is finding the right timestamps. Unfortunately, the
timestamps correspond to the completion time of the job that added the
data, rather than the last modification time of the table, which can differ
slightly. To remove data added by a job, you must find the job metadata or
adjust the timestamp via trial and error. We will leave it as an exercise for
you to figure out how to fix a table that has bad data sandwiched between
good data, for example when a valid import follows a bad import.
Caveats
Tables being updated via the streaming API deserve special attention. Time
slices that have no end time will include all rows that have been inserted
successfully (subject to the caveats discussed in Chapter 6 with respect to
streaming). Beyond that the behavior is loosely defined. Behind the scenes,
rows are buffered and then inserted in batches. Each batch insertion
behaves like a job, and all the rows in the batch appear atomically at a
specific timestamp. The fact that a row is associated with two separate
timestamps, row insert and batch insert, leads to surprising behavior when
slicing these tables. However, it is reasonable to use slices to isolate recently
added records, namely an interval with no end time. This will at least fetch
Search WWH ::




Custom Search