Information Technology Reference
In-Depth Information
6.6.4 Racks
Racks themselves do not usually fail. They are steel and have no active components.
However, many failures are rack-wide. For example, a rack may have a single power feed
or network uplink that is shared by all the equipment in the rack. Intrusive maintenance is
often done one rack at a time.
As a result, a rack is usually a failure domain. In fact, intentionally designing each rack
to be its own failure domain turns out to be a good, manageable size for most distributed
systems.
Rack Diversity
You can choose to break a service into many replicas and put one replica in each rack.
With this arrangement, the service has rack diversity . A simple example would be a DNS
service where each DNS server is in a different rack so that a rack-wide failure does not
cause a service outage.
In a Hadoop cluster, data files are stored on multiple machines for safety. The system
tries to achieve rack diversity by making sure that at least one replica of any data block is
in a different rack than the other data blocks.
Rack Locality
Making a service component self-contained within a single rack also offers certain be-
nefits. Bandwidth is plentiful within a rack but sparse between racks. All the machines in
the rack connect to the same switch at the top of the rack. This switch has enough internal
bandwidth that any machine can talk to any machine within the rack at full bandwidth,
and all machines can dothis at the same time—a scheme called non-blocking bandwidth .
Between racks there is less bandwidth. Rack uplinks are often 10 times the links between
the machines, but they are a shared resource used by all the machines in the rack (typically
20 or 40). There is contention for bandwidth between racks. The article “A Guided Tour
through Data-center Networking” ( Abts & Felderman 2012 ) drills down into this topic us-
ing Google's networks as examples.
Because bandwidth is plentiful inside the rack and the rack is a failure domain, often a
service component is designed to fit within a rack. Small queries come in, they use a lot of
bandwidthtogeneratetheanswer,andasmallormedium-sizereplyleaves.Thismodelfits
well given the bandwidth restrictions.
Theservicecomponentisthenreplicatedonmanyracks.Eachreplicahas rack locality ,
inthatitisself-containedwithintherack.Itisdesignedtotakeadvantageofthehighband-
width and the rack-sized failure domain.
Search WWH ::




Custom Search