Database Reference
In-Depth Information
Chapter 4. Implementing Analytics with
Greenplum UAP
In this chapter wewillfocus onactual implementation of the coretasks indata science
life cycle using Greenplum analytics platform. As a quick recap, let us look at all that
we covered until now. We have defined characteristics of Big Data, requirements for
the next generation analytics, and business intelligence platform. We have also learnt
about various phases of data science life cycle, and understood all that Greenplum
has to offer to address the analytics' requirements. We have covered a little theory
on some standard analytical methods and have had a quick onboarding exercise for
R, Weka, and MADlib frameworks. We now have analytics' requirements and we also
know where Greenplum product suite can be leveraged.
Let's now look at the implementation using Greenplum Products. We will also look at
integration between various components.
This chapter covers the following topics:
• Data loading
• Structured (into Greenplum)
Using Greenplum loading utilities in combination with external tables
Using external ETL tool (like Informatica; we will cover using Informat-
ica's PWX Connector for Greenplum for high-speed data loading)
• Unstructured data (into Hadoop)
• Using Greenplum data loaders to load data into Hadoop Distributed File
System ( HDFS )
• Loading data from Hadoop (HDFS) into Greenplum
• Data unloading from Greenplum and Hadoop environments
• Querying and reporting data
• Querying Greenplum
Search WWH ::




Custom Search