Database Reference
In-Depth Information
Figure 10.2 Understanding Gold, Silver, and Bronze Hadoop
environments
As the data graduates, we should start seeing other patterns emerge. These
patterns aren't constrained to the value in the data but also to attributes of
the data. Typically, we'll see the data take on greater structure to facilitate
more mainstream analysis and integration with other tools. Furthermore,
we would expect to see improvements in data quality and integrity in these
other environments. When we think about integrating this data with the
broader enterprise, this becomes increasingly important (because we need
to pick our moment when it comes to data integration).
Throw Compute at the Problem
All this flexibility comes at a cost, and that cost is a need to throw greater
amounts of compute power at the problem. This philosophy is baked into
Hadoop's architecture and is why batch engines like MapReduce exist.
MapReduce'sabilitytoscalecomputeresourcesisacentraltenetofHadoop.
Therefore, Hadoop developers have really taken this to heart.
In some cases, there is an overreliance on this compute, which can lead to
a level of laziness. Rather than optimize the data processing and possibly
leverage other technologies, Hadoop developers can “waste” CPU by
re-executing the same MapReduce program over and over. Once the trend
 
Search WWH ::




Custom Search