Database Reference
In-Depth Information
7
Hadoop, R, and Python
“Information is the oil of the 21 st Century, and analytics is the combustion engine.”
- Peter Sondergaard
I've already given rather extensive coverage to the essentials of Hadoop in Book One
of A Simple Introduction to Data Science , and in Unicorns Among Us: Understanding the
High Priests of Data Science , so I won't do the same here. Nevertheless, I will give a bit of a
crash-course of a paragraph or so recapping Hadoop fundamentals for the uninitiated. There
will be rather more extensive coverage of the open-source programming languages R and
Python, not discussed in the previous volume.
OK. Hadoop in brief. Buckle your seat-belts.
Hadoop is the most valuable bedrock tool in all of Data Science. This powerful open-
source software platform empowers Data Scientists to store and process Big Data on very
large clusters of commodity hardware. Hadoop enables massive data storage and processing
at dramatic speeds.
Additional chief attributes and benefits of Hadoop are economy, vast computing power,
easy scalability, great flexibility in storage, and built-in data protection enhanced with “self-
healing” capabilities.
Economy? Yes, open-source software is free and runs on relatively inexpensive com-
modity hardware. As regards computing power and scalability, Hadoop's distributed com-
puting model enables Data Scientists to rapidly process extremely large slices of data,
whether that data be structured or not, simply by increasing the number of computing nodes.
As regards storability, Hadoop allows you store as much data as you'd care to, whether that
data is structured of unstructured, without preprocessing . In other words: store data now,
process it and use it later, or not at all. As regards data protection, Hadoop protects both data
and the ongoing application processing from hardware failure by its use of distributed nodes.
If a node fails, processing and data immediately redirect to other nodes. As an additional
backup, Hadoop automatically creates and stores redundant copies of data.
Search WWH ::




Custom Search