Database Reference
In-Depth Information
Appendix B. Cloudera's Distribution
Including Apache Hadoop
Cloudera's Distribution Including Apache Hadoop (hereafter CDH ) is an integrated Apache
Hadoop-based stack containing all the components needed for production, tested and pack-
aged to work together. Cloudera makes the distribution available in a number of different
formats: Linux packages, virtual machine images, tarballs, and tools for running CDH in
the cloud. CDH is free, released under the Apache 2.0 license, and available at ht-
tp://www.cloudera.com/cdh .
As of CDH 5, the following components are included, many of which are covered else-
where in this topic:
Apache Avro
A cross-language data serialization library; includes rich data structures, a fast/compact
binary format, and RPC
Apache Crunch
A high-level Java API for writing data processing pipelines that can run on MapReduce
or Spark
Apache DataFu (incubating)
A library of useful statistical UDFs for doing large-scale analyses
Apache Flume
Highly reliable, configurable streaming data collection
Apache Hadoop
Highly scalable data storage (HDFS), resource management (YARN), and processing
(MapReduce)
Apache HBase
Column-oriented real-time database for random read/write access
Apache Hive
SQL-like queries and tables for large datasets
Hue
Web UI to make it easy to work with Hadoop data
Search WWH ::




Custom Search