Databases Reference
In-Depth Information
database systems. While these data modeling approaches were suitable to managing data
at scale and that for structured data only, the big data realm has thrown in additional
challenges of variety exposing the shortcomings in the technology architecture and the
performance of relational databases.
The cost of scaling and managing infrastructure while delivering
a satisfactory consumer experience for newer applications such
as web 2.0 and social media applications has proven to be quite
steep. This has led to the development of “NoSQL” databases
as an alternative technology with features and capabilities that
deliver the needs of the particular use case.
Data Integration: For years, traditional data warehousing and data management
approaches has been supported by data integration tools for data migration and
transportation using Extract-Transform-Load (ETL) approach. These tools run into
throughput issues while handling large volumes of data and are not very flexible in
handling semi-structured data.
To overcome these challenges in the big data scenario, there has
been a push toward focusing on extract and load approaches
(often referred to as data ingestion ) and applying versatile but
programmatically driven parallel transformation techniques such
as map-reduce.
Data integration as a process is highly cumbersome and iterative especially when
you want to add new data sources. This step often creates delays in incorporating new
data sources for analytics, resulting in the loss of value and relevance of the data before it
can be utilized. Current approaches to EDW follow the waterfall approach, wherein until
you finish one phase, you can't move on to the next phase.
While this approach has its merits to ensure the right data sources
are picked and the right data integration processes are developed
to sustain the usefulness of the EDW. In big data scenario, the
situation is completely different; one has to ingest a growing
number of new data sources, many of them are very loosely
defined and probably have no definitions at all, thereby posing
significant challenges to the traditional approach of the EDW
development lifecycle. In addition, there is a growing need from
the business to analyze and get quick insightful and actionable
results; they are not ready to wait!
Cost: The costs to manage the data infrastructure (storage, computing, and analysis)
have risen significantly due to vendor lock-ins and usage of proprietary technologies.
Most enterprises do not even have a clear picture of what kind of data assets they have,
where they are located and how much data they have. In many cases, companies do
not have a clear enough idea of this asset to predict and anticipate data growth. With all
these unknowns, there is a dire need for quicker and more agile approaches to the entire
software development lifecycle.
 
Search WWH ::




Custom Search