Greenplum Database is a shared nothing, massively parallel processing data ware-
housing solution that helps handle petabyte scale data with ease. It is built on an
open source database, PostgreSQL. It can be physical or virtual, can run on any kind
of hardware, and there is a software only version of Greenplum that customers can
leverage. As we now understand the characteristics and concept of Big Data, let us
next explore the concept of data warehousing.
This section introduces readers to the concept of data warehousing as well as the
basic elements used in building and implementing a data warehouse.
A data warehouse is a consolidation of information gathered about the enterprise.
It is a centralized or single point of data reference for enterprise data that usually
comes from multiple sources and facilitates ease of access for analysis.
Following are the characteristics of data in a data warehouse:
• Integrated, centralized, and unique : Irrespective of the various sources of
data in an enterprise, a data warehouse is responsible to hold a single copy
• Datadefinition/metadata : Data warehouse requires a unique data definition
supporting data aggregation process. This data now becomes a single ver-
sion of truth for the enterprise.
• Relevant and subject-oriented : Data relevance is identified by its timely
availability and historic data to be referenced against the time element. Also,
data usually has time dimension.
• Non-volatile : Data stored is usually in read-only formats.
• Security : Confidential data must be protected against unauthorized access.
The following figure depicts various components of data warehouse architecture: