Information Technology Reference
In-Depth Information
For example, imagine a distributed file system with petabytes of data spread out over
thousands of machines. Each file is split into gigabyte-sized chunks. Each chunk is stored
on multiple machines for redundancy. This scheme also permits the creation of files larger
than those that would fit on one machine. A master server tracks the list of files and iden-
tifies where their chunks are. If you are familiar with the UNIX file system, the master can
be thought of as storing the inodes, or per-file lists of data blocks, and the other machine
as storing the actual blocks of data. File system operations go through a master server that
uses the inode-like information to determine which machines to involve in the operation.
Imaginethatalargereadrequestcomesin.Themasterdeterminesthatthefilehasafew
terabytes stored on one machine and a few terabytes stored on another machine. It could
request the data from each machine and relay it to the system that made the request, but
the master would quickly become overloaded while receiving and relaying huge chunks of
data. Instead, it replies with a list of which machines have the data, and the requestor con-
tacts those machines directly for the data. This way the master is not the middle man for
those large data transfers. This situation is illustrated in Figure 1.7 .
Figure 1.7: This master server delegates replies to other servers.
1.5 The CAP Principle
CAP stands for consistency, availability, and partition resistance. The CAP Principle states
that it is not possible to build a distributed system that guarantees consistency, availability,
and resistance to partitioning. Any one or two can be achieved but not all three simultan-
eously. When using such systems you must be aware of which are guaranteed.
Search WWH ::




Custom Search