Data Warehousing Revisited - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

Thus evolved the first generation of OLTP applications. Around the same time in the 1970s, Edgar. F.

Codd published his paper on the relational model of systems for managing data. 1 The paper was piv-

otal in several ways:

●

It introduced for the first time a relationship-based approach to understanding data.

●

It introduced the first approach to modeling data.

●

It introduced the idea of abstracting the management and storage of data from the user.

●

It discussed the idea of isolating applications and data.

●

It discussed the idea of removing duplicates and reducing redundancy.

Codd's paper and the release of System R, the first experimental relational database, provided the

first glimpse of moving to a relational model of database systems. The subsequent emergence of mul-

tiple relational databases, such as Oracle RDB, Sybase, and SQL/DS, within a few years of the 1980s

were coupled with the first editions of SQL language. OLTP systems started emerging stronger on

the relational model; for the first time companies were presented with two-tier applications where

the graphical user interface (GUI) was powerful enough to model front-end needs and the underlying

data was completely encapsulated from the end user.

In the late 1970s and early 1980s, the first concepts of data warehousing emerged with the need to

store and analyze the data from the OLTP. The ability to gather transactions, products, services, and

locations over a period of time started providing interesting capabilities to companies that were never

there in the OLTP world, partially due to the design of the OLTP and due to the limitations with the

scalability of the infrastructure.

Traditional data warehousing, or data warehousing 1.0

In the early days of OLTP systems, there were multiple applications that were developed by companies

to solve different data needs. This was good from the company's perspective because systems processed

data quickly and returned results, but the downside was the results from two systems did not match. For

example, one system would report sales to be $5,000 for the day and another would report $35,000 for

the day, for the same data. Reconciliation of data across the systems proved to be a nightmare.

The definition of a data warehouse by Bill Inmon that is accepted as the standard by the industry

states that the data warehouse is a subject-oriented, nonvolatile, integrated, time-variant collection of

data in support of management's decision. 2

The first generation of data warehouses that we have built and continue to build are tightly tied

to the relational model and follow the principles of Codd's data rules. There are two parts to the data

warehouse in the design and architecture. The first part deals with the data architecture and process-

ing; per Codd's paper, it answers the data encapsulation from the user. The second part deals with the

database architecture, infrastructure, and system architecture. Let us take a quick overview of the data

architecture and the infrastructure of the data warehouse before we discuss the challenges and pitfalls

of traditional data warehousing.

1 Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, 13 (6),

377-387. doi:10.1145/362384.362685.

2 http://www.inmoncif.com/home/

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home