Databases Reference
In-Depth Information
Figure 6.8 A comparison of a mySQL SQL query with MongoDB's map and reduce functions. The queries
perform similar functions, but the MongoDB query can easily be distributed over hundreds of processors.
(Attribution to Rick Osborne)
Traditional filesystems store data in a single location; if a drive fails, data is restored
from a backup drive. By default, files in HDFS are stored in three locations; if a drive
fails, the data is automatically replicated to another drive. It's possible to get the same
functionality using a fault-tolerant system like RAID drives. But RAID drives are more
expensive and difficult to configure on commodity hardware.
HDFS s are different: they use a large (64 megabytes by default) block size to han-
dle data. Figure 6.9 shows how large HDFS blocks are compared to a typical operating
system.
HDFS also has other properties that make it different from an ordinary filesystem.
You can't update the value of a few bytes in an existing block without deleting the old
block and adding an entirely new block. HDFS is designed for large blocks of immuta-
ble data that are created once and read many times. Efficient updates aren't a primary
consideration for HDFS .
Although HDFS is considered a filesystem, and can be mounted like other filesys-
tems, you don't usually work with HDFS as you would an additional disk drive on your
Windows or UNIX system. An HDFS system wouldn't be a good choice for storing typi-
cal Microsoft office documents that are updated frequently. HDFS is designed to be a
Search WWH ::




Custom Search