Database Reference
In-Depth Information
Greenplum Database
Greenplum Database is a shared nothing, massively parallel processing solution
builttosupportnextgenerationdatawarehousingandBigDataanalyticsprocessing.
It stores and analyzes voluminous structured data. It comes in a software-only ver-
sion that works on commodity servers (this being its unique selling point) and ad-
ditionally also is available as an appliance (DCA) that can take advantage of large
clusters of powerful servers, storage, and switches. GPDB (Greenplum Database)
comes with a parallel query optimizer that uses a cost-based algorithm to evaluate
and select optimal query plans. Its high-speed interconnection supports continuous
pipelining for data processing.
Note
In its new distribution under Pivotal, Greenplum Database is called Pivotal
(Greenplum) Database.
Hadoop (HD)
HD stands for Hadoop. This software is a commercially supported distribution of
Apache Hadoop. It includes HDFS ( Hadoop Distributed File System ), MapRe-
duce, and other ecosystem packages from Apache like HBase, Hive, Pig, Mahout,
Sqoop, Flume, YARN, and ZooKeeper.
Hadoop is known for its capabilities to handle storage and processing of large
volumes of unstructured data (volumes to the degree of petabytes) on commodity
servers with its robust underlying distributed file system HDFS, and its parallel pro-
cessing framework, MapReduce. It is also known for its fault-tolerant and high-avail-
ability architecture.
Note
Some of the new endeavors in Pivotal with Pivotal HD include leveraging HD as
an underlying storage for Greenplum Database with a vision to have scalability
Search WWH ::




Custom Search