Databases Reference
In-Depth Information
Columnar Databases
Organizing data in rows has been the standard approach for so long that practitioners
have understood that this is the only way to store and retrieve data. An address list, a
customer list, an inventory of products—you can just envision the neat row of fields and
data going from left to right on your screen. Databases such as Oracle, MS SQL Server,
DB2 and MySQL are the best-known row-based databases.
Row-based databases are ubiquitous because so many of our most important business
systems are transactional. Row-oriented databases are well suited for transactional
environments, such as a call center where a customer's entire interaction history is required
when their profile is retrieved and/or when fields are frequently updated.
Where row-based databases run into trouble is when they are used to handle
analytic loads against large volumes of data, especially when user queries are dynamic
and ad-hoc in nature.
To understand why, let's look at a database of sales transactions with 50 days of
data and 1 million rows per day (Figure 4-5 ). Each row has 30 columns of data. So, this
database has 30 columns and 50 million rows. You want to see how many toasters were
sold for the third week of this period. A row-based database would return 7 million rows
(1 million for each day of the third week) with 30 columns for each row—or 210 million
data elements. That's a lot of data elements to crunch to find out how many toasters were
sold that week.
Figure 4-5. Column-based data structure
Column-oriented databases allow data to be stored column by column rather than
row by row. Column-oriented databases are better suited for analytics where, unlike
transactions, only portions of each record are required. By grouping the data together this
way, the database only needs to retrieve columns that are relevant to the query, greatly
reducing the overall I/O.
 
Search WWH ::




Custom Search