Implementing Analytics with Greenplum UAP - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

INSERT INTO <<table_name>> (<<column

names list separated by commas>>)

VALUES (<<corresponding values>>);

• Example:

INSERT INTO employee (id, firstname,

lastname) VALUES (001, 'John',

'Grisham');

• COPY : COPY command is one of the initial ways of loading data. It is not par-

allelized, but is typically used in case of loading large volumes of data and

we can run multiple copy commands concurrently. It facilitates copying data

from STDIN or STDOUT using the connection between the master node and

the client. Given the fact that it can handle volumes and can be manually run

concurrently, it is much easier and quicker compared to the other options dis-

cussed below.

• Example:

COPY employees FROM '/usr/home/

historicemployees.dat' WITH

DELIMITER '|';

• External tables : External tables are unique to Greenplum and are typically

used for high-speed, parallel, and bulk loading. External tables access file-

based data using file:// or gpfdist:// protocols and dynamic sources

can be accessed via http:// protocol. More details on external tables are

covered in the next section.

• gpload : gpload is a wrapper utility for external tables that internally uses a

load specification in a YAML formatted control file. More details in gpload

utility are covered in a separate section below.

Before starting to detail available options of loading data for Greenplum Database,

let us take a dive deep into Greenplum's external tables. Greenplum has built-in ETL

capabilitiesandwecanloadandunloaddatausingGreenplum'sexternaltables.The

following figure depicts the data loading process that involves loading data via the

master node. Both INSERT and COPY commands follow this route.

Search WWH ::

Custom Search

Home