Database Reference
In-Depth Information
clause. For example, in the query above, we would write JOIN Employees BY
(EmployeeID, PostalCode) LEFT OUTER .
13.3 Column-Store Database Systems
So far, we have assumed DBMS architectures with the typical record-oriented
storage, where attributes of a record are placed contiguously in disk pages.
Thus, a disk page contains a certain number of database tuples, which at the
moment of being queried are accessed either sequentially or through some
of the indexes studied in Chap. 7 . These architectures are appropriate for
OLTP systems. For systems oriented to ad hoc querying large amounts of
data (like in OLAP), other structures can do better, for example, column-
store databases , where the values for each column (or attribute) are stored
contiguously in the disk pages, such that a disk page will contain a number of
database columns. Thus, a database record is scattered into many different
disk pages. We study this architecture next.
Figure 13.2 a shows the row-store organization, where records are stored in
disk pages. Figure 13.2 b shows the column-store alternative. In most systems,
a page contains a single column. However, if a column does not fit in a page,
it will be stored in as many pages as needed. When evaluating a query over
a column-store architecture, a DBMS just needs to read the values of the
columns involved in the query, thus avoiding to load into memory irrelevant
attributes. For example, consider a typical data warehouse query over the
Northwind data warehouse as follows:
SELECT CustomerName, SUM(SalesAmount)
FROM Sales S, Customer C, Product P, Time T, Employee E
WHERE S.CustomerKey = C.CustomerKey AND
S.ProductKey = P.ProductKey AND S.TimeKey = T.TimeKey AND
S.EmployeeKey = E.EmployeeKey AND
P.Discontinued = ' Yes ' AND T.Year = ' 2012 ' AND E.City = ' Berlin '
GROUP BY C.CustomerName
Depending on the query evaluation strategy, the query above may require
accessing all columns of all the tables in the FROM clause, totaling 51
columns. The number of columns can increase considerably in a real-world
enterprise data warehouse. However, only 12 of them are actually needed
to evaluate this query. Therefore, a row-oriented DBMS will read into main
memory a large number of columns that do not contribute to the result
and which will probably be pruned by a query optimizer. On the contrary,
a column-store database system will just look for the pages containing the
columns actually used in the query. Further, it is likely that the values for
E.City , T.Year ,and P.Discontinued will fit in main memory.
Search WWH ::




Custom Search