Database Reference
In-Depth Information
Note
The three nodes are master node, slave node and BulkLoader CLI node. Any ex-
isting MapReduce and HDFS deployment can be leveraged.
ThefollowingfiguredepictsvariouscomponentsinHadoopHDFSandtheBulkLoad-
er components.
Using external ETL to load data into Greenplum
All the Greenplum utilities discussed earlier have some limitations in terms of what
data source formats they can support; and we have seen that they are typically the
file formats such as TXT, XML, CSV, and other custom formats.
As a further step to supporting any other data source formats, Greenplum can be in-
tegrated with an external data integration tool such as Informatica, Pentaho, Talend,
and others. As a part of Data Integration Accelerator Module, Greenplum provides
integration end points with these ETL to facilitate high-speed parallel data loading in-
to the Greenplum Database. The following figure depicts the flow of how an external
ETL server can load data directly into the segment servers to achieve high through-
put.
Search WWH ::




Custom Search