Database Reference
In-Depth Information
Now, let us look at the case of integrating Informatica PowerCenter into Greenplum
Data Integration Accelerator ( DIA ).
Informatica has PWX connectors for Greenplum that facilitate high-speed parallel
data loading. The Greenplum Database is designed to load large volumes of data
quickly with few jobs running in parallel. In order to take advantage of Greenplum's
capabilities, such large volume loads through Informatica should use PWX for
Greenplum. PWX for Greenplum utilizes the Greenplum load utilities gpload / gpf-
dist that takes advantage of the database's massively parallel, shared nothing ar-
chitecture.
We can use Informatica PWX Connector for Greenplum with Greenplum DIA. The
segment servers of Greenplum connect directly to the external files served via gpf-
dist . The load bypasses the master server in this case. Segment servers are then
loaded in parallel. The external tables point to the streamed files on the ETL host.
The loader utilities allow for loading of data to a single table. If a PowerCenter map-
ping has multiple Greenplum targets, PWX for Greenplum starts a separate load-
er instance for each target. Each loader instance will have a separate connection
to Greenplum. The total number of Greenplum connections used is the number of
Greenplum targets in the mapping multiplied by the number of partitions configured
in the session. For mappings with many targets and/or many partitions, the number
of Greenplum connections used may not be allowed by the database, or may cause
out of memory issues on the Greenplum segments. In that case consider staging
the data to a Greenplum staging table and using follow-on processing to load from
there to the target tables. Refer the following section on ETLT for more details on
Search WWH ::




Custom Search