Implementing Analytics with Greenplum UAP - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

Summary

In this chapter, we have explored various implementation aspects of Greenplum UAP.

We started with understanding data loading strategies for Greenplum and HD. We

have looked at loading data into Greenplum using internal utilities and functions such

as gpload and gpfdist and also using Informatica PowerExchange connector. For

HD, we have explored Hive and Greenplum bulk loader utility.

We moved on to take a dive deep into distribution and partitioning aspects of Green-

plum along with strategies for querying Greenplum and HD. We have looked at vari-

ous functions such as ANALYZE and EXPLAIN to optimize the queries and interpreta-

tion of query plans. Finally, we have explored some in-database analytics options with

Greenplum (using Windows function, integrating MADlib, and using PL/R). At the end

of this chapter, readers should be fairly familiar with various implementation aspects

of Greenplum in conjunction with Hadoop for implementing data storage and analytics

for Big Data.