Greenplum Unified Analytics Platform (UAP) - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

Greenplum Database

Greenplum Database is a shared nothing, massively parallel processing solution

builttosupportnextgenerationdatawarehousingandBigDataanalyticsprocessing.

It stores and analyzes voluminous structured data. It comes in a software-only ver-

sion that works on commodity servers (this being its unique selling point) and ad-

ditionally also is available as an appliance (DCA) that can take advantage of large

clusters of powerful servers, storage, and switches. GPDB (Greenplum Database)

comes with a parallel query optimizer that uses a cost-based algorithm to evaluate

and select optimal query plans. Its high-speed interconnection supports continuous

pipelining for data processing.

Note

In its new distribution under Pivotal, Greenplum Database is called Pivotal

(Greenplum) Database.

Hadoop (HD)

HD stands for Hadoop. This software is a commercially supported distribution of

Apache Hadoop. It includes HDFS ( Hadoop Distributed File System ), MapRe-

duce, and other ecosystem packages from Apache like HBase, Hive, Pig, Mahout,

Sqoop, Flume, YARN, and ZooKeeper.

Hadoop is known for its capabilities to handle storage and processing of large

volumes of unstructured data (volumes to the degree of petabytes) on commodity

servers with its robust underlying distributed file system HDFS, and its parallel pro-

cessing framework, MapReduce. It is also known for its fault-tolerant and high-avail-

ability architecture.

Note

Some of the new endeavors in Pivotal with Pivotal HD include leveraging HD as

an underlying storage for Greenplum Database with a vision to have scalability

Search WWH ::

Custom Search

Home