Database Reference
In-Depth Information
Chapter 15. Sqoop
Aaron Kimball
A great strength of the Hadoop platform is its ability to work with data in several different
forms. HDFS can reliably store logs and other data from a plethora of sources, and MapRe-
duce programs can parse diverse ad hoc data formats, extracting relevant information and
combining multiple datasets into powerful results.
But to interact with data in storage repositories outside of HDFS, MapReduce programs
need to use external APIs. Often, valuable data in an organization is stored in structured
data stores such as relational database management systems (RDBMSs). Apache Sqoop is
an open source tool that allows users to extract data from a structured data store into Ha-
doop for further processing. This processing can be done with MapReduce programs or
other higher-level tools such as Hive. (It's even possible to use Sqoop to move data from a
database into HBase.) When the final results of an analytic pipeline are available, Sqoop
can export these results back to the data store for consumption by other clients.
In this chapter, we'll take a look at how Sqoop works and how you can use it in your data
processing pipeline.
Search WWH ::




Custom Search