Database Reference
In-Depth Information
Upsert: Performs a MERGE on the destination table, where new data
is inserted and existing data is updated. You will need to specify one
or more columns that will be used to join the data.
FastAppend: As its name implies, FastAppend is the fastest way to
load data into a destination table. The trade-off is that it does not sup-
port rollback; in the case of a failure, you are responsible for remov-
ing any partially inserted rows. FastAppend will also bypass the sta-
ging database, causing high levels of fragmentation.
Let's take a moment to discuss how to use these modes with two common load pat-
terns. If you are performing regular, incremental loads on a large table (say, updating a
transactional sales table with the previous day's orders), you should load the data direc-
tly using Append, since no transformations are required. Now let's say you're loading
the same data, but you plan to instead transform the data and load into a mart before
deleting the temporary data. This second example would be better suited to the FastAp-
pend mode. Or, to say it more concisely, use FastAppend any time you are loading into
an empty, intermediate working table.
There is one last option we need to discuss. Underneath the Loading Mode is a
checkbox for Roll-Back Load on Table Update or Insert Failure. In order to understand
this option, you need to understand a little about how data is loaded into PDW. When
data is loaded using the Append, Reload, or Upsert modes, PDW performs a two-phase
load. In Phase 1, the data is loaded into the staging database. In Phase 2, PDW per-
forms an INSERT / SELECT of the sorted data into the final destination table. By de-
fault, data is loaded in parallel on all Compute nodes, but it is loaded serially within a
Compute node to each distribution. This is necessary in order to support rollback.
Roughly 85-95% of the load process is spent in Phase 1. When Roll-Back Load on
Table Update or Insert Failure is deselected, each distribution is loaded in parallel in-
stead of serially during Phase 2. So, in other words, deselecting this option will im-
prove performance but only affects 5-15% of the overall process. Also, deselecting this
option removes PDW's ability to roll back; in the event of a failure during Phase 2, you
would be responsible for cleaning up any partially inserted data.
Because of the potential risk and minimal gain, it is best practice to deselect this
option only when you are loading to an empty table. FastAppend is unaffected by this
option because it always skips Phase 2 and loads directly into the final table, which is
why FastAppend also does not support rollback.
Search WWH ::




Custom Search