Implementing Analytics with Greenplum UAP - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

The following are the functions:

• Deployment of code

• Partitioning of data into chunks

• Splitting jobs into multiple tasks

• Scheduling the tasks, taking into account data locality, and network topology

• Handling any job failures

Greenplum Data Loader can dynamically scale the execution of data loading tasks to

maximize the system resource. It can linearly scale out to multiple disks or multiple

machines depending on the cluster setup.

Additionally, Greenplum Data Loader component supports a wide variety of source

data store/access protocols—HDFS, local FS (DAS), NFS, FTP, and HTTPS. It in-

ternally uses master/slave architecture and can be managed through both CLI and

GUI.

Bulk loader components are listed in the following table:

Component Summary

BulkLoader

manager

An administrative GUI for managing data load processing. Provides

REST interfaces to integrate with any other external clients.

BulkLoader

scheduler

This is a job scheduling service to help schedule loading jobs.

BulkLoader

CLI

This is a command-line interface to run loading jobs.

The Greenplum Data Loader cluster copies data from the source data store to the

destination cluster. The cluster is composed of three types of logical nodes.

Search WWH ::

Custom Search

Home