Database Reference
In-Depth Information
pgloader is written in Python, and allows connection to PostgreSQL through the standard
Python client interface. Yes, pgloader is less efficient than loading data files using a COPY
command, but running a COPY has many more restrictions: the file has to already be in the
right place on the server, has to be in the right format, and must be unlikely to throw errors on
load. pgloader has additional overhead, but it also has the ability to load data using multiple
parallel threads, so it can be faster to use as well. pgloader's ability to call out to reformat
functions written in Python is often essential in most cases; straight COPY is just too simple.
pgloader also allows loading from fixed-width files, which COPY cannot.
There's more...
If you need to reload the table from fresh completely, then specify --truncate on the
command line of pgloader.
After loading, if we had load errors, then there will be some junk loaded into the PostgreSQL
tables. Not junk you can see, or that gives any semantic errors, but think of it more like
fragmentation. You should think about whether you need to run with --vacuum as an
additional option, though this will make the load take possibly much longer.
We need to be careful to avoid loading data twice. The only easy way of doing that is to make
sure there is at least one unique index defined on every table that you load. The load should
then fail very quickly.
String handling can often be difficult, because of the presence of formatting or non-printable
characters. The default setting for PostgreSQL is to have a parameter named standard_
conforming_strings set off, which means that backslashes will be assumed to be escape
characters. Put another way, by default the string '\n' means linefeed, that can cause data to
appear truncated. You'll need to turn standard_conforming_strings = on , or you'll need
to specify an escape character in the load-parameter file.
If you are re-loading data that has been unloaded from PostgreSQL, then you may want to
use the pg_restore utility instead.. The pg_restore utility has an option to reload data in
parallel, -j number_of_threads, though this is only possible if the dump was produced using the
custom pg_dump format. Refer to the recipes in the Backup chapter for more details. This
can be useful for reloading dumps, though it lacks almost all of the other pgloader features
discussed here.
See also
You may wish to send an e-mail to Dimitri Fontaine, the current author and maintainer
of most of pgloader. He always loves to receive e-mails from users.
 
Search WWH ::




Custom Search