Typically, all relational databases are row-oriented, each new row indicates a new
data set for the given table structure. Column-oriented data storage, like the name
indicates, stores data by its column rather than row. The primary difference lies in
the way the hard disk is accessed that results in efficiency. The following screenshot
depicts the difference between row- and column-oriented databases:
The drawback with the regular row-oriented RDBMS databases is that the number of
the rows in a table impacts on the performance of SQL query running on that table.
If we look at analytic query requirements, what is usually required is a column and
by employing a row-oriented storage, we end up accessing the whole row as the
erations is the record. There are a few techniques to overcome the full table scans,
one of which is indexing. But, as we all know, it comes with its own overhead.
With column-oriented databases (also referred to as column stores), the data is
decomposed into respective columns and the granularity of I/O access is now the
column and this could mean significant gain in query efficiency.
These stores are used for read-intensive data that is large in volume. A high degree
of compression can be achieved here and in Greenplum, these tables are by default
append-only. In short, column stores can be used in conjunction with row stores as
they are complimentary. Whenever a write is required, it is moved into a row store
that is later compressed and synchronized or moved into a column store.