gpfdist can uncompress gzip and bzip2 files by default.
To maximize the performance of gpfdist , following are a few points we should con-
As the number of segments increases, overall parallel processing should be max-
imized. We can look at splitting the large file into smaller chunks, typically of similar
size, and share them across all the gpfdist locations. Run gpfdist on as many
interfaces as possible (and be aware of bonded NICs and be sure to start enough
gpfdist to work them). Work should be distributed even across all these resources.
In an MPP shared nothing environment, load speed as much as the speed of the
slowest node. Any skew in the load file layout will cause the overall load to bottle-
neck on that resource.
The gp_external_max_segments configuration controls maximum number of
segments each gpfdist serves. It gives a number that segments can access ex-
ternal files in parallel. Default value for this parameter is 64 . It is important that we
keep an even factor for gp_external_max_segments and number of gpfdist
gpfdist is installed in $GPHOME /bin on Greenplum master and segment servers/
• Starting and stopping gpfdist :
• To start gpfdist :
$ gpfdist -d /var/load_files -p 8081
-l /home/gpadmin/log &
For multiple gpfdist instances on the same ETL host (refer figure
$ gpfdist -d /var/load_files1 -p
8081 -l /home/gpadmin/log1 &