Emerging Database Landscape - Big Data Imperatives

Databases Reference

In-Depth Information

As the diagram suggests, we might use Hadoop or possibly some NoSQL or NewSQL

database to continually gather an ever-growing volume of data, which may be of

uncertain data quality. Such data can be characterized as low value, since it is not highly

cleansed or processed and may be composed of simple event stream data that requires

further processing to derive value.

This data store is analogous to a staging area in traditional data warehouse design

but in the big data realm is termed as a “data ingestion” process resulting in a data

“lake,” whose primary purpose is to support data extracts and transformations intended

to feed other data stores. A relatively high latency will usually be adequate for some of

this activity. Other uses may require continuous ingest of event streams and real-time

monitoring as the data is recorded.

Following this in the data flow is an EDW. Most likely it will serve analytic and

BI applications that require a better response time or higher level of concurrency than

the data lake could provide. We view this data store as containing more valuable data that

has been processed and further enriched and contextualized leveraging data from the

data lake.

Following this in the data flow is the analytics sandboxes, wherein you can expect

to have a relatively lower level of latency. In this data store, there will be sophisticated

analytics modules with very high data computation intensiveness.

Finally, higher value data extracted from the analytic data store flows to an in-memory

data store, which feeds applications that demand extremely low latency to satisfy business

needs. It may well be the case that the best solution for such a set of business needs is to use

different database products for each workload type.

Polyglot Persistence: The Next Generation

Database Architecture

Distributed databases and especially NoSQL did solve the scalability and

performance- related issues; however, they are just one part of the larger enterprise

database management ecosystem. Enterprise database management landscape is all

about catering to the mixed workload of OLAP and OLTP. SQL skills and tools are highly

prevalent in the enterprise database management ecosystem, and more importantly

people have an SQL mind-set. So, assuming a NoSQL-only database management system

for the enterprise is a harder fact to accept. The primary challenge with NoSQL is that it's

not SQL. Each NoSQL data store is unique and so requires careful design considerations.

SQL focuses on “what” (ability to query the data and use the data) and not “how”

(how is the data distributed). Business users and developers are well versed with the

“what,” now exposing them to also learn the “how” part is increasingly difficult. Hadoop

is a great example of this phenomenon. Even though Hadoop has seen widespread

adoption it's still limited to silos in organizations. You won't find a large number of

applications that are exclusively written for Hadoop. The developers first have to learn

how to structure and organize data that makes sense for Hadoop and then write an

extensive procedural logic to operate on that data set. The enterprise software is all about

SQL. Embracing, extending, and augmenting SQL is a smart thing to do.

But at the same time we can't ignore the power of NoSQL databases, hence the use

of heterogeneous data stores within the enterprise is gradually becoming a common

Big Data Imperatives

Search WWH ::

Custom Search

Home