Data Warehouses and Hadoop Integration - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Figure 10.2 Understanding Gold, Silver, and Bronze Hadoop

environments

As the data graduates, we should start seeing other patterns emerge. These

patterns aren't constrained to the value in the data but also to attributes of

the data. Typically, we'll see the data take on greater structure to facilitate

more mainstream analysis and integration with other tools. Furthermore,

we would expect to see improvements in data quality and integrity in these

other environments. When we think about integrating this data with the

broader enterprise, this becomes increasingly important (because we need

to pick our moment when it comes to data integration).

Throw Compute at the Problem

All this flexibility comes at a cost, and that cost is a need to throw greater

amounts of compute power at the problem. This philosophy is baked into

Hadoop's architecture and is why batch engines like MapReduce exist.

MapReduce'sabilitytoscalecomputeresourcesisacentraltenetofHadoop.

Therefore, Hadoop developers have really taken this to heart.

In some cases, there is an overreliance on this compute, which can lead to

a level of laziness. Rather than optimize the data processing and possibly

leverage other technologies, Hadoop developers can “waste” CPU by

re-executing the same MapReduce program over and over. Once the trend

Search WWH ::

Custom Search

Home