Database Reference
In-Depth Information
• Who will manage and monitor them?
Because Hadoop is a scale-out architecture, the first question of quantity
is really a trigger point to think about the broader issues associated with
the scale of the deployment. In actuality, the answer to the scale question
provides additional context into the decision making of other factors,
particularly flexibility and elasticity.
In terms of scale, there are also considerations that relate to limitations. For
example, in HDInsight, Microsoft currently allows a maximum of 40 data
nodes.However, thisismerelyanartificial capplacedontheserviceandcan
be lifted. Architecturally no limit applies.
One might say the same about an on-premise deployment. Certainly, the
largest clusters in the world are on premise. However, practicalities will
often get in the way. In truth, the same challenges exist for Azure. There has
tobecapacityinthedatacentertotakeyourrequest.However,Ihavetosay,
I quite like the idea of making this Microsoft's problem.
Security
Hadoop doesn't have a very sophisticated method of securing the data that
is resident in the Hadoop Distributed File System (HDFS). The security
models range from weak to none. Therefore, your approach to meeting
your security needs is an important factor in your decision-making process.
You might want to consider the network layer in addition to the operating
system and physical hardware when evaluating all these options. Other
optionsincludea“securebydefault”configuration,whichmaywellbeworth
replicating if you want to lock down your deployment.
Proximity
When addressing the question of proximity, you must know where the data
is born. This is relevant for a number of reasons, but the prime reason
is latency. We do not want the source and analytical systems to be far
apart, because if they are, this distance will add latency to the analysis. That
latency can often be directly correlated back to cost; a short local network
can often be significantly cheaper and result in less impact that than a
geographically dispersed network.
The value of the insights from the data may depreciate significantly as the
data ages. In these situations, therefore, we may want to keep in close
Search WWH ::




Custom Search