Database Reference
In-Depth Information
gpfdist
can uncompress
gzip
and
bzip2
files by default.
To maximize the performance of
gpfdist
, following are a few points we should con-
sider.
As the number of segments increases, overall parallel processing should be max-
imized. We can look at splitting the large file into smaller chunks, typically of similar
size, and share them across all the
gpfdist
locations. Run
gpfdist
on as many
interfaces as possible (and be aware of bonded NICs and be sure to start enough
gpfdist
to work them). Work should be distributed even across all these resources.
In an MPP shared nothing environment, load speed as much as the speed of the
slowest node. Any skew in the load file layout will cause the overall load to bottle-
neck on that resource.
The
gp_external_max_segments
configuration controls maximum number of
segments each
gpfdist
serves. It gives a number that segments can access ex-
ternal files in parallel. Default value for this parameter is
64
. It is important that we
keep an even factor for
gp_external_max_segments
and number of
gpfdist
processes.
gpfdist
is installed in
$GPHOME
/bin on Greenplum master and segment servers/
hosts.
• Starting and stopping
gpfdist
:
• To start
gpfdist
:
$ gpfdist -d /var/load_files -p 8081
-l /home/gpadmin/log &
For multiple
gpfdist
instances on the same ETL host (refer figure
onpage13),useadifferent basedirectoryandportforeachinstance.
For example:
$ gpfdist -d /var/load_files1 -p
8081 -l /home/gpadmin/log1 &
Search WWH ::
Custom Search