Using NoSQL to manage big data - Making Sense of NoSQL

Databases Reference

In-Depth Information

Figure 6.8 A comparison of a mySQL SQL query with MongoDB's map and reduce functions. The queries

perform similar functions, but the MongoDB query can easily be distributed over hundreds of processors.

(Attribution to Rick Osborne)

Traditional filesystems store data in a single location; if a drive fails, data is restored

from a backup drive. By default, files in HDFS are stored in three locations; if a drive

fails, the data is automatically replicated to another drive. It's possible to get the same

functionality using a fault-tolerant system like RAID drives. But RAID drives are more

expensive and difficult to configure on commodity hardware.

HDFS s are different: they use a large (64 megabytes by default) block size to han-

dle data. Figure 6.9 shows how large HDFS blocks are compared to a typical operating

system.

HDFS also has other properties that make it different from an ordinary filesystem.

You can't update the value of a few bytes in an existing block without deleting the old

block and adding an entirely new block. HDFS is designed for large blocks of immuta-

ble data that are created once and read many times. Efficient updates aren't a primary

consideration for HDFS .

Although HDFS is considered a filesystem, and can be mounted like other filesys-

tems, you don't usually work with HDFS as you would an additional disk drive on your

Windows or UNIX system. An HDFS system wouldn't be a good choice for storing typi-

cal Microsoft office documents that are updated frequently. HDFS is designed to be a

Search WWH ::

Custom Search

Home