Database Reference
In-Depth Information
extracting information from large data sets such as the Web, social-network
graphs, and large document repositories.”
Identifying Operational Challenges
In this section, we'll cover what you need to do to plan for setup and
configuration. In addition, we'll discuss what you need to do to plan for
ongoing maintenance.
Planning for Setup/Configuration
An early decision that will need to be made is the quantity and quality
of hardware on which to run your Hadoop cluster. Generally, Hadoop is
designed to be built on commodity server hardware and JBODs (just a
bunch of disks). That doesn't mean that you can run down to your local
electronics store and buy a few cheap $800-servers and be good to go.
Commodity hardware is still server-class hardware, but the point is that you
won't need to go out and spend tens of thousands of dollars per server. You
generally purchase two classes of servers for your Hadoop cluster: one for
master nodes, and a second one for all the worker nodes.
The master server should have more redundancy built in to it: multiple
power supplies, multiple Ethernet ports, RAID 1 for the operating system
LUN, and so forth. The master server requires more memory than the
worker nodes. Generally, you can start with 32GB of memory for a master
server of a small cluster and grow that to as much as 128GB or more for a
large cluster that has more than 250 worker nodes.
The worker servers don't need the redundancy of the master server, but
need to be built with balance in mind. They need to be able to store the data
you have planned for your Hadoop cluster, but they also need to be able to
process it appropriately when its time to query the data. You first need to
consider how many and what size disks you need. Of course, this depends
on the hardware vendor and the configuration of the server that you are
purchasing. But after that, you will need to take an educated guestimate of
your needs.
The first thing to remember is that you will be replicating your data three
times. Assuming that you are using the default replication factor of 3, if
you have a need for 100TB of space you will need enough servers so that
you can store 300TB of space. But you aren't done yet. You need temporary
Search WWH ::




Custom Search