Database Reference
In-Depth Information
How Hadoop Tools Can Help
Hadoop tools are a good fit for your big data needs. When I refer to Hadoop tools, I mean the whole Apache
( www.apache.org ) tool set related to big data. A community-based, open-source approach to software development,
the Apache Software Foundation (ASF) has had a huge impact on both software development for big data and
the overall approach that has been taken in this field. It also fosters significant cross-pollination of both ideas and
development by the parties involved—for example, Google, Facebook, and LinkedIn. Apache runs an incubator
program in which projects are accepted and matured to ensure that they are robust and production worthy.
Hadoop was developed by Apache as a distributed parallel big data processing system. It was written in
Java and released under an Apache license. It assumes that failures will occur, and so it is designed to offer both
hardware and data redundancy automatically. The Hadoop platform offers a wide tool set for many of the big data
functions that I have mentioned. The original Hadoop development was influenced by Google's MapReduce and
the Google File System.
The following list is a sampling of tools available in the Hadoop ecosystem. Those marked in boldface are
introduced in the chapters that follow:
Ambari
Hadoop management and monitoring
Avro
Data serialization system
Chukwa
Data collection and monitoring
Hadoop
Hadoop distributed storage platform
Hama
BSP scientific computing framework
HBase
Hadoop NoSQL non-relational database
Hive
Hadoop data warehouse
Hue
Hadoop web interface for analyzing data
Mahout
Scalable machine learning platform
Map/Reduce
Algorithm used by the Hadoop MR component
Nutch
Web crawler
Oozie
Workflow scheduler
Pentaho
Open-source analytics tool set
Pig
Data analysis high-level language
Solr
Search platform
Sqoop
Bulk data-transfer tool
Storm
Distributed real-time computation system
Yarn
Map/Reduce in Hadoop Version 2
ZooKeeper
Hadoop centralized configuration system
When grouped together, the ASF, Lucene, and other provider tools, some of which are here, provide a rich
functional set that will allow you to manipulate your data.
 
Search WWH ::




Custom Search