Databases Reference
In-Depth Information
FIGURE 4.20
Sqoop1 architecture.
implemented as it can be in the current Hadoop ecosystem. (For further details on HCatalog, please
see the Apache Foundation and HortonWorks websites.)
Sqoop
As the Hadoop ecosystem evolves, we will find the need to integrate data from other existing “enter-
prise” data platforms including the data warehouse, metadata engines, enterprise systems (ERP,
SCM), and transactional systems. All of this data cannot be moved to Hadoop as their nature of small
volumes, low latency, and computations are not oriented for Hadoop workloads. To provide a connec-
tion between Hadoop and the RDBMS platforms, Sqoop has been developed as the connector. There
are two versions, Sqoop1 and Sqoop2. Let us take a quick look at this technology.
Sqoop1
In the first release of Sqoop, the design goals were very simple ( Figure 4.20 ):
Export/import data from the enterprise data warehouse, relational databases, and NoSQL
databases.
Connector-based architecture with plugins from vendors.
No metadata store.
Use Hive and HDFS for data processing.
Use Oozie for scheduling and managing jobs.
Currently you can download and install Sqoop from the Apache Foundation website or from any
Hadoop distribution. The installation is manual and needs configuration steps to be followed without
any miss.
Sqoop is completely driven by the client-side installation and heavily depends on JDBC technol-
ogy as the first release of Sqoop was developed in Java. In the workflow shown in Figure 4.20 , you
can import and export the data from any database with simple commands that you can execute from a
command-line interface (CLI), for example:
Import syntax - sqoop import --connect jdbc:mysql://localhost/testdb \--table
PERSON --username test --password ****.
 
Search WWH ::




Custom Search