Databases Reference
In-Depth Information
Switch
Racks of compute nodes
Figure 2.1: Compute nodes are organized into racks, and racks are intercon-
nected by a switch
disk crashes, the files would be lost forever. We discuss file management
in Section 2.1.2.
2. Computations must be divided into tasks, such that if any one task fails
to execute to completion, it can be restarted without affecting other tasks.
This strategy is followed by the map-reduce programming system that we
introduce in Section 2.2.
2.1.2
Large-Scale File-System Organization
To exploit cluster computing, files must look and behave somewhat differently
from the conventional file systems found on single computers. This new file
system, often called a distributed file system or DFS (although this term has
had other meanings in the past), is typically used as follows.
•Files can be enormous, possibly a terabyte in size. If you have only small
files, there is no point using a DFS for them.
•Files are rarely updated. Rather, they are read as data for some calcula-
tion, and possibly additional data is appended to files from time to time.
For example, an airline reservation system would not be suitable for a
DFS, even if the data were very large, because the data is changed so
frequently.
Files are divided into chunks, which are typically 64 megabytes in size.
Chunks are replicated, perhaps three times, at three different compute nodes.
Moreover, the nodes holding copies of one chunk should be located on different
Search WWH ::




Custom Search