Database Reference
In-Depth Information
Cluster Specification
Hadoop is designed to run on commodity hardware. That means that you are not tied to ex-
pensive, proprietary offerings from a single vendor; rather, you can choose standardized,
commonly available hardware from any of a large range of vendors to build your cluster.
“Commodity” does not mean “low-end.” Low-end machines often have cheap components,
which have higher failure rates than more expensive (but still commodity-class) machines.
When you are operating tens, hundreds, or thousands of machines, cheap components turn
out to be a false economy, as the higher failure rate incurs a greater maintenance cost. On
the other hand, large database-class machines are not recommended either, since they don't
score well on the price/performance curve. And even though you would need fewer of
them to build a cluster of comparable performance to one built of mid-range commodity
hardware, when one did fail, it would have a bigger impact on the cluster because a larger
proportion of the cluster hardware would be unavailable.
Hardware specifications rapidly become obsolete, but for the sake of illustration, a typical
choice of machine for running an HDFS datanode and a YARN node manager in 2014
would have had the following specifications:
Processor
Two hex/octo-core 3 GHz CPUs
Memory
64−512 GB ECC RAM [ 68 ]
Storage
12−24 × 1−4 TB SATA disks
Network
Gigabit Ethernet with link aggregation
Although the hardware specification for your cluster will assuredly be different, Hadoop is
designed to use multiple cores and disks, so it will be able to take full advantage of more
powerful hardware.
Search WWH ::




Custom Search