Databases Reference
In-Depth Information
Figure 6.8
A comparison of a mySQL SQL query with MongoDB's map and reduce functions. The queries
perform similar functions, but the MongoDB query can easily be distributed over hundreds of processors.
(Attribution to Rick Osborne)
Traditional filesystems store data in a single location; if a drive fails, data is restored
from a backup drive. By default, files in
HDFS
are stored in three locations; if a drive
fails, the data is automatically replicated to another drive. It's possible to get the same
functionality using a fault-tolerant system like
RAID
drives. But
RAID
drives are more
expensive and difficult to configure on commodity hardware.
HDFS
s are different: they use a large (64 megabytes by default) block size to han-
dle data. Figure 6.9 shows how large
HDFS
blocks are compared to a typical operating
system.
HDFS
also has other properties that make it different from an ordinary filesystem.
You can't update the value of a few bytes in an existing block without deleting the old
block and adding an entirely new block.
HDFS
is designed for large blocks of immuta-
ble data that are created once and read many times. Efficient updates aren't a primary
consideration for
HDFS
.
Although
HDFS
is considered a filesystem, and can be mounted like other filesys-
tems, you don't usually work with
HDFS
as you would an additional disk drive on your
Windows or
UNIX
system. An
HDFS
system wouldn't be a good choice for storing typi-
cal Microsoft office documents that are updated frequently.
HDFS
is designed to be a