• gpfdist : It points to a directory on the file host or ETL host and loads
all external data files into Greenplum primary segments in parallel
• gpfdists : It provides a secure gpfdist
• file:// : It is used to access external data files on a segment host that
the only super user ( gpadmin ) can access
• gphdfs : It points to files on the HDFS
• The SEGMENT REJECT LIMIT clause is used to define criteria for single row
error handling. If we do not specify this clause, it would mean all or nothing,
a complete failure when the first failure is encountered.
• FORMAT is used to define the format (for example, TEXT or CSV).
• In case of DROP EXTERNAL (WEB) TABLE , only the table definition is
dropped and the source data is not disturbed.
Web external tables in Greenplum are used to handle dynamic data sources. Web
external tables can either be command-based or URL-based.
Command-based web external tables are the tables that get data based on the out-
put of a shell script or command. The command or script must reside on the hosts
and should be specified within the EXECUTE clause.
By default, the command is run on all segment hosts and in every segment instance.
We can control the number of segment instances we would like to have the com-
mand run. The ON clause lists the hosts on which the command needs to be run.
An example is shown as follows:
CREATE EXTERNAL WEB TABLE test_output
(id int, name text)
EXECUTE '/tmp/load_scripts/get_test_data.sh' ON
FORMAT 'TEXT' (DELIMITER '|');
URL-based web tables get data from the web tables using HTTP protocol.
The LOCATION clause is used to define the list of files on a web server using ht-
tp:// protocol. The web data files are expected to be accessible to the Greenplum