Database Reference
In-Depth Information
workspace for queries which can be up to 30% of the drive capacity. Finally,
try to always maintain 10% free disk space. History has taught us that
when disks have less than 10% free space, performance suffers. Add those
requirements up and you are at 420TB of space required. If each server you
purchase can store 30TB of data (10 drives × 3TB), you need a minimum of
14 worker nodes for your Hadoop appliance.
The Compression Factor
You may have noticed that I did not consider compression factor in the
previous calculation. You will likely take advantage of compressing data
in Hadoop. The Hadoop technologies like Hive and Pig handle various
forms of compression very well. So, you may use Zip, RAR, or BZip
compression technologies, and getting a compression factor of 5 to 7
times is not unusual. This will reduce the number of worker nodes
required to support your solution.
Required disk space = (Replication factor)(Total
data TB)(1.4) / Compression factor
Next you need to consider CPU and memory. CPU for each worker node
should at a minimum have two quad-core CPUs running at least 2.5GHz.
Hex and Octo core solutions should be considered for heavy computing
solutions. The newest chips are not necessary because the mid-level chips
will generally give you the processing power you need without generating
the heat and consuming the electricity of the most powerful chips. Generally
speaking, each task that runs inside a worker node will require anywhere
from between 2GB and 4GB of memory. A machine with 96GB of memory
will be able to run 24 and 48 tasks at any given time. Therefore, a system
with 14 worker nodes would be able to run 336 to 672 tasks at any given
time. Understanding the potential parallel computing requirements of your
solution will help you determine if this is sufficient.
Start the planning and building of your cluster assuming that you will begin
with a balanced cluster configuration. If you build your cluster from the
beginning with different server classes, different processors and memory,
or different storage capacities, you will be spending an inordinate amount
Search WWH ::




Custom Search