Database Reference
In-Depth Information
the data. Because the data is distributed, HDFS can quickly detect faults
and failures and subsequently automatically and transparently recover.
High throughput : Where most file systems strive for low-latency
operations, HDFS is more focused on high throughput, even at the
expense of latency. This characteristic means that HDFS can stream
data to its clients to support analytical processing over large sets of data
and favors batch over interactive operations. With forward-looking
features like caching and tiered storage, it will no longer be the case that
HDFS is not good for interactive operations.
Support for large data sets : It's not uncommon for HDFS to contain
files that range in size from several gigabytes all the way up to several
terabytes and can include data sets in excess of tens of millions of files
per instance (all accomplished by scaling on cost-effective commodity
hardware).
Write-once read-many (WORM) principle : This is one of the
guiding principles of HDFS and is sometimes referred to as coherency.
More simply put, data files in HDFS are written and when closed are
never updated. This simplification enables the high level of throughput
obtained by HDFS.
Data locality : In a normal application scenario, a process requests
data for a source; the data is then transferred from the source over a
network to the requestor who can then process it. This time tested and
proven process often works fine on smaller data sets. As the size of the
data set grows, however, bottlenecks and hotspots begin to appear.
Server resources and networks can quickly become overwhelmed as the
whole process breaks down. HDFS overcomes this limitation by
providing facilities or interfaces for applications to move the
computation to the data, rather than moving the data to the
computation.
As one of the more critical pieces of the Hadoop ecosystem, it's worth
spending a little extra time to understand the HDFS architecture and how it
enables the aforementioned capabilities.
Explaining the HDFS Architecture
Beforediscussingmachinerolesornodes,let'slookatthemostfundamental
concept within HDFS: the block. You may already be familiar with the
Search WWH ::




Custom Search