Greenplum Unified Analytics Platform (UAP) - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

Column-oriented databases

Typically, all relational databases are row-oriented, each new row indicates a new

data set for the given table structure. Column-oriented data storage, like the name

indicates, stores data by its column rather than row. The primary difference lies in

the way the hard disk is accessed that results in efficiency. The following screenshot

depicts the difference between row- and column-oriented databases:

The drawback with the regular row-oriented RDBMS databases is that the number of

the rows in a table impacts on the performance of SQL query running on that table.

If we look at analytic query requirements, what is usually required is a column and

by employing a row-oriented storage, we end up accessing the whole row as the

columnislockedinitsplaceinarow.Inotherwords,thelevelofgranularityofI/O op-

erations is the record. There are a few techniques to overcome the full table scans,

one of which is indexing. But, as we all know, it comes with its own overhead.

With column-oriented databases (also referred to as column stores), the data is

decomposed into respective columns and the granularity of I/O access is now the

column and this could mean significant gain in query efficiency.

These stores are used for read-intensive data that is large in volume. A high degree

of compression can be achieved here and in Greenplum, these tables are by default

append-only. In short, column stores can be used in conjunction with row stores as

they are complimentary. Whenever a write is required, it is moved into a row store

that is later compressed and synchronized or moved into a column store.

Search WWH ::

Custom Search

Home