Emerging Database Landscape - Big Data Imperatives

Databases Reference

In-Depth Information

Columnar Databases

Organizing data in rows has been the standard approach for so long that practitioners

have understood that this is the only way to store and retrieve data. An address list, a

customer list, an inventory of products—you can just envision the neat row of fields and

data going from left to right on your screen. Databases such as Oracle, MS SQL Server,

DB2 and MySQL are the best-known row-based databases.

Row-based databases are ubiquitous because so many of our most important business

systems are transactional. Row-oriented databases are well suited for transactional

environments, such as a call center where a customer's entire interaction history is required

when their profile is retrieved and/or when fields are frequently updated.

Where row-based databases run into trouble is when they are used to handle

analytic loads against large volumes of data, especially when user queries are dynamic

and ad-hoc in nature.

To understand why, let's look at a database of sales transactions with 50 days of

data and 1 million rows per day (Figure 4-5 ). Each row has 30 columns of data. So, this

database has 30 columns and 50 million rows. You want to see how many toasters were

sold for the third week of this period. A row-based database would return 7 million rows

(1 million for each day of the third week) with 30 columns for each row—or 210 million

data elements. That's a lot of data elements to crunch to find out how many toasters were

sold that week.

Figure 4-5. Column-based data structure

Column-oriented databases allow data to be stored column by column rather than

row by row. Column-oriented databases are better suited for analytics where, unlike

transactions, only portions of each record are required. By grouping the data together this

way, the database only needs to retrieve columns that are relevant to the query, greatly

reducing the overall I/O.

Search WWH ::

Custom Search

Home