Database Reference
In-Depth Information
this. Please refer to the Greenplum documentation for information on database con-
nections.
Following is the workflow between Informatica and Greenplum servers:
• PWX for Greenplum starts a gpload process providing it a configuration file
for the work to be done. It also creates a named pipe to pass data to gpf-
dist.
gpload kicks off a gpfdist process and gpfdist process provides data
to Greenplum segments.
gpload communicates with Greenplum Database and sets up the load.
• The Greenplum master communicates with the Greenplum segment servers
and instructs them to connect back to the gpfdist process to start pulling
in data.
• The Greenplum segment servers connect with gpfdist and request the
data.
• PWX for Greenplum writes data to the named pipe, gpfdist reads it from
the named pipe, and the Greenplum segment servers pull data in directly
from gpfdist.
The DIA servers, combined with the massively parallel processing databases in the
DCA, are perfectly configured to be used as nodes in a PowerCenter grid. The
scalability nature of the DIA allows you to add power and performance to your In-
formatica grid when more performance is needed for your data integration projects.
In this case where Informatica is installed within DIA, the data load leverages the
high-speed interconnect to load data.
Extraction, Load, and Transformation (ELT) and Extraction,
Transformation, Load, and Transformation (ETLT)
ELT and ETLT are highly performing approaches when working with Informatica and
Greenplum. Informatica can be used for complex parsing of source data and for
transformation that can be achieved without looking up large numbers of records
against large Greenplum tables. The data can be loaded to Greenplum staging
tables using PWX for Greenplum. Any remaining transformation logic in Greenplum
can be achieved in one of the following ways:
• Greenplum scripts
Search WWH ::




Custom Search