Database Reference
In-Depth Information
•
gpfdist
: It points to a directory on the file host or ETL host and loads
all external data files into Greenplum primary segments in parallel
•
gpfdists
: It provides a secure
gpfdist
•
file://
: It is used to access external data files on a segment host that
the only super user (
gpadmin
) can access
•
gphdfs
: It points to files on the HDFS
• The
SEGMENT REJECT LIMIT
clause is used to define criteria for single row
error handling. If we do not specify this clause, it would mean all or nothing,
a complete failure when the first failure is encountered.
•
FORMAT
is used to define the format (for example, TEXT or CSV).
• In case of
DROP EXTERNAL (WEB) TABLE
, only the table definition is
dropped and the source data is not disturbed.
Web external tables in Greenplum are used to handle dynamic data sources. Web
external tables can either be command-based or URL-based.
Command-based web external tables are the tables that get data based on the out-
put of a shell script or command. The command or script must reside on the hosts
and should be specified within the
EXECUTE
clause.
By default, the command is run on all segment hosts and in every segment instance.
We can control the number of segment instances we would like to have the com-
mand run. The
ON
clause lists the hosts on which the command needs to be run.
An example is shown as follows:
CREATE EXTERNAL WEB TABLE test_output
(id int, name text)
EXECUTE '/tmp/load_scripts/get_test_data.sh' ON
HOST
FORMAT 'TEXT' (DELIMITER '|');
URL-based web tables get data from the web tables using
HTTP
protocol.
The
LOCATION
clause is used to define the list of files on a web server using
ht-
tp://
protocol. The web data files are expected to be accessible to the Greenplum
segment hosts.
Search WWH ::
Custom Search