Sqoop - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Chapter 15. Sqoop

Aaron Kimball

A great strength of the Hadoop platform is its ability to work with data in several different

forms. HDFS can reliably store logs and other data from a plethora of sources, and MapRe-

duce programs can parse diverse ad hoc data formats, extracting relevant information and

combining multiple datasets into powerful results.

But to interact with data in storage repositories outside of HDFS, MapReduce programs

need to use external APIs. Often, valuable data in an organization is stored in structured

data stores such as relational database management systems (RDBMSs). Apache Sqoop is

an open source tool that allows users to extract data from a structured data store into Ha-

doop for further processing. This processing can be done with MapReduce programs or

other higher-level tools such as Hive. (It's even possible to use Sqoop to move data from a

database into HBase.) When the final results of an analytic pipeline are available, Sqoop

can export these results back to the data store for consumption by other clients.

In this chapter, we'll take a look at how Sqoop works and how you can use it in your data

processing pipeline.

Search WWH ::

Custom Search

Home