Implementing Analytics with Greenplum UAP - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

Chapter 4. Implementing Analytics with

Greenplum UAP

In this chapter wewillfocus onactual implementation of the coretasks indata science

life cycle using Greenplum analytics platform. As a quick recap, let us look at all that

we covered until now. We have defined characteristics of Big Data, requirements for

the next generation analytics, and business intelligence platform. We have also learnt

about various phases of data science life cycle, and understood all that Greenplum

has to offer to address the analytics' requirements. We have covered a little theory

on some standard analytical methods and have had a quick onboarding exercise for

R, Weka, and MADlib frameworks. We now have analytics' requirements and we also

know where Greenplum product suite can be leveraged.

Let's now look at the implementation using Greenplum Products. We will also look at

integration between various components.

This chapter covers the following topics:

• Data loading

• Structured (into Greenplum)

Using Greenplum loading utilities in combination with external tables

Using external ETL tool (like Informatica; we will cover using Informat-

ica's PWX Connector for Greenplum for high-speed data loading)

• Unstructured data (into Hadoop)

• Using Greenplum data loaders to load data into Hadoop Distributed File

System ( HDFS )

• Loading data from Hadoop (HDFS) into Greenplum

• Data unloading from Greenplum and Hadoop environments

• Querying and reporting data

• Querying Greenplum

Search WWH ::

Custom Search

Home