Introducing Big Data Technologies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

FIGURE 4.20

Sqoop1 architecture.

implemented as it can be in the current Hadoop ecosystem. (For further details on HCatalog, please

see the Apache Foundation and HortonWorks websites.)

Sqoop

As the Hadoop ecosystem evolves, we will find the need to integrate data from other existing “enter-

prise” data platforms including the data warehouse, metadata engines, enterprise systems (ERP,

SCM), and transactional systems. All of this data cannot be moved to Hadoop as their nature of small

volumes, low latency, and computations are not oriented for Hadoop workloads. To provide a connec-

tion between Hadoop and the RDBMS platforms, Sqoop has been developed as the connector. There

are two versions, Sqoop1 and Sqoop2. Let us take a quick look at this technology.

Sqoop1

In the first release of Sqoop, the design goals were very simple ( Figure 4.20 ):

●

Export/import data from the enterprise data warehouse, relational databases, and NoSQL

databases.

●

Connector-based architecture with plugins from vendors.

●

No metadata store.

●

Use Hive and HDFS for data processing.

●

Use Oozie for scheduling and managing jobs.

Currently you can download and install Sqoop from the Apache Foundation website or from any

Hadoop distribution. The installation is manual and needs configuration steps to be followed without

any miss.

Sqoop is completely driven by the client-side installation and heavily depends on JDBC technol-

ogy as the first release of Sqoop was developed in Java. In the workflow shown in Figure 4.20 , you

can import and export the data from any database with simple commands that you can execute from a

command-line interface (CLI), for example:

Import syntax - sqoop import --connect jdbc:mysql://localhost/testdb \--table

PERSON --username test --password ****.

Search WWH ::

Custom Search

Home