Databases Reference
In-Depth Information
CHAPTER
10
Integration of Big Data and
Data Warehousing
It is a capital mistake to theorize before one has data.
—Sherlock Holmes, A Study in Scarlett (Arthur Conan Doyle)
INTRODUCTION
The data warehouse of today, while still building on the founding principles of an “enterprise version
of truth” and a “single data repository,” must address the needs of data of new types, new volumes,
new data-quality levels, new performance needs, new metadata, and new user requirements. As dis-
cussed in earlier chapters, there are several issues in the current data warehouse environments that
need to be addressed and, more importantly, the current infrastructure cannot support the needs of
the new data on the same platform. We have also discussed the emergence of new technologies that
can definitely enhance the performance needs of the current data warehouse and provide a holistic
platform for the extended requirements of the new data and associated user needs. The big question is
how do we go about integrating all of this into one data warehouse? And, more importantly, how do
we justify the data warehouse of the future?
The focus of this chapter is to discuss the integration of Big Data and the data warehouse, the pos-
sible techniques and pitfalls, and where to leverage a technology. How do we deal with complexity
and heterogeneity of technologies? What are the performance and scalabilities of each technology,
and how can we sustain performance for the new environment?
If one were to take a journey back in history and look at the architectural wonders that were
built, we often wonder what kind of blueprints the architects considered, how they decided on the
laws of physics, and how they combined chemical properties of materials for the structures to last
for centuries while supporting visitor volumes and climate changes. In building the new data ware-
house, we need to adapt a radical thinking like the architects of the yore, where we will retain the
fundamental definition of the data warehouse as stated by Bill Inmon, but we will be developing
a physical architecture that will not be constrained by the boundaries of a single platform like the
RDBMS.
The next-generation data warehouse architecture will be complex from a physical architecture
deployment, consisting of a myriad of technologies, and will be data-driven from an integration per-
spective, extremely flexible, and scalable from a data architecture perspective.
199
Search WWH ::




Custom Search