Databases Reference
In-Depth Information
As the diagram suggests, we might use Hadoop or possibly some NoSQL or NewSQL
database to continually gather an ever-growing volume of data, which may be of
uncertain data quality. Such data can be characterized as low value, since it is not highly
cleansed or processed and may be composed of simple event stream data that requires
further processing to derive value.
This data store is analogous to a staging area in traditional data warehouse design
but in the big data realm is termed as a “data ingestion” process resulting in a data
“lake,” whose primary purpose is to support data extracts and transformations intended
to feed other data stores. A relatively high latency will usually be adequate for some of
this activity. Other uses may require continuous ingest of event streams and real-time
monitoring as the data is recorded.
Following this in the data flow is an EDW. Most likely it will serve analytic and
BI applications that require a better response time or higher level of concurrency than
the data lake could provide. We view this data store as containing more valuable data that
has been processed and further enriched and contextualized leveraging data from the
data lake.
Following this in the data flow is the analytics sandboxes, wherein you can expect
to have a relatively lower level of latency. In this data store, there will be sophisticated
analytics modules with very high data computation intensiveness.
Finally, higher value data extracted from the analytic data store flows to an in-memory
data store, which feeds applications that demand extremely low latency to satisfy business
needs. It may well be the case that the best solution for such a set of business needs is to use
different database products for each workload type.
Polyglot Persistence: The Next Generation
Database Architecture
Distributed databases and especially NoSQL did solve the scalability and
performance- related issues; however, they are just one part of the larger enterprise
database management ecosystem. Enterprise database management landscape is all
about catering to the mixed workload of OLAP and OLTP. SQL skills and tools are highly
prevalent in the enterprise database management ecosystem, and more importantly
people have an SQL mind-set. So, assuming a NoSQL-only database management system
for the enterprise is a harder fact to accept. The primary challenge with NoSQL is that it's
not SQL. Each NoSQL data store is unique and so requires careful design considerations.
SQL focuses on “what” (ability to query the data and use the data) and not “how”
(how is the data distributed). Business users and developers are well versed with the
“what,” now exposing them to also learn the “how” part is increasingly difficult. Hadoop
is a great example of this phenomenon. Even though Hadoop has seen widespread
adoption it's still limited to silos in organizations. You won't find a large number of
applications that are exclusively written for Hadoop. The developers first have to learn
how to structure and organize data that makes sense for Hadoop and then write an
extensive procedural logic to operate on that data set. The enterprise software is all about
SQL. Embracing, extending, and augmenting SQL is a smart thing to do.
But at the same time we can't ignore the power of NoSQL databases, hence the use
of heterogeneous data stores within the enterprise is gradually becoming a common
 
Search WWH ::




Custom Search