Database Reference
In-Depth Information
How Hadoop Tools Can Help
Hadoop tools are a good fit for your big data needs. When I refer to Hadoop tools, I mean the whole Apache
(
www.apache.org
) tool set related to big data. A community-based, open-source approach to software development,
the Apache Software Foundation (ASF) has had a huge impact on both software development for big data and
the overall approach that has been taken in this field. It also fosters significant cross-pollination of both ideas and
development by the parties involved—for example, Google, Facebook, and LinkedIn. Apache runs an incubator
program in which projects are accepted and matured to ensure that they are robust and production worthy.
Hadoop was developed by Apache as a distributed parallel big data processing system. It was written in
Java and released under an Apache license. It assumes that failures will occur, and so it is designed to offer both
hardware and data redundancy automatically. The Hadoop platform offers a wide tool set for many of the big data
functions that I have mentioned. The original Hadoop development was influenced by Google's MapReduce and
the Google File System.
The following list is a sampling of tools available in the Hadoop ecosystem. Those marked in boldface are
introduced in the chapters that follow:
•
Ambari
Hadoop management and monitoring
•
Avro
Data serialization system
•
Chukwa
Data collection and monitoring
•
Hadoop
Hadoop distributed storage platform
•
Hama
BSP scientific computing framework
•
HBase
Hadoop NoSQL non-relational database
•
Hive
Hadoop data warehouse
•
Hue
Hadoop web interface for analyzing data
•
Mahout
Scalable machine learning platform
•
Map/Reduce
Algorithm used by the Hadoop MR component
•
Nutch
Web crawler
•
Oozie
Workflow scheduler
•
Pentaho
Open-source analytics tool set
•
Pig
Data analysis high-level language
•
Solr
Search platform
•
Sqoop
Bulk data-transfer tool
•
Storm
Distributed real-time computation system
•
Yarn
Map/Reduce in Hadoop Version 2
•
ZooKeeper
Hadoop centralized configuration system
When grouped together, the ASF, Lucene, and other provider tools, some of which are here, provide a rich
functional set that will allow you to manipulate your data.
Search WWH ::
Custom Search