The following are the functions:
• Deployment of code
• Partitioning of data into chunks
• Splitting jobs into multiple tasks
• Scheduling the tasks, taking into account data locality, and network topology
• Handling any job failures
Greenplum Data Loader can dynamically scale the execution of data loading tasks to
maximize the system resource. It can linearly scale out to multiple disks or multiple
machines depending on the cluster setup.
Additionally, Greenplum Data Loader component supports a wide variety of source
data store/access protocols—HDFS, local FS (DAS), NFS, FTP, and HTTPS. It in-
ternally uses master/slave architecture and can be managed through both CLI and
Bulk loader components are listed in the following table:
An administrative GUI for managing data load processing. Provides
REST interfaces to integrate with any other external clients.
This is a job scheduling service to help schedule loading jobs.
This is a command-line interface to run loading jobs.
The Greenplum Data Loader cluster copies data from the source data store to the
destination cluster. The cluster is composed of three types of logical nodes.