Databases Reference
In-Depth Information
Chunk location
GFS Master (Metadara)
Application
+
GFS Client
GFS chunk server
File system
Chunk data
GFS chunk server
File system
GFS chunk server
File system
FIGURE 4.4
Google GFS.
Source: Google Briefing2. 2 .
Google ile system
In the late 1990s, Google was expanding the search processing capabilities to scale up effectively
on massive volumes of data. In the quest for performance and scalability, Google discovered that its
requirements could not be met by traditional file systems, and thus was born the need to create a
file system that could meet the demands and rigor of an extremely high-performance file system for
large-scale data processing on commodity hardware clusters.
Google subsequently published the design concepts in 2001, in a whitepaper titled the “Google
File System” (GFS), which has revolutionized the industry today. The key pieces of the architecture
as shown in Figure 4.4 include:
A GFS cluster:
A single master
Multiple chunk servers (workers or slaves) per master
Accessed by multiple clients
Running on commodity Linux machines
A file:
Represented as fixed-sized chunks
Labeled with 64-bit unique global IDs
Stored at chunk servers and three-way mirrored across chunk servers
In the GFS cluster, input data files are divided into chunks (64 MB is the standard chunk size),
each assigned its unique 64-bit handle, and stored on local chunk server systems as files. To ensure
fault tolerance and scalability, each chunk is replicated at least once on another server, and the default
design is to create three copies of a chunk.
If there is only one master, there is a potential bottleneck in the architecture, right? The role of the mas-
ter is to communicate to clients which chunk servers have which chunks and their metadata information.
Clients' tasks then interact directly with chunk servers for all subsequent operations, and use the master
only in a minimal fashion. The master, therefore, never becomes or is in a position to become the bottleneck.
2 Google File System 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003.
 
Search WWH ::




Custom Search