Databases Reference
In-Depth Information
This command will generate a series of tasks:
Generate SQL code.
Execute SQL code.
Generate MapReduces jobs.
Execute MapReduce jobs.
Transfer data to local files or HDFS.
Export syntax - sqoop export --connect jdbc:mysql://localhost/testdb \ --table
CLIENTS_INTG --username test --password **** \ --export-dir /user/localadmin/CLIENTS
This command will generate a series of tasks:
Generate MapReduce jobs.
Execute MapReduce jobs.
Transfer data from local files or HDFS.
Compile SQL code.
Create or insert into CLIENTS_INTG table.
There are many features of Sqoop1 that are easy to learn and implement, for example, on the com-
mand line you can specify if the import is directly to Hive, HDFS, or HBase. There are direct connec-
tors to the most popular databases: Oracle, SQL Server, MySQL, Teradata, and PostGres.
There are evolving challenges with Sqoop1, including:
Cryptic command-line arguments.
Nonsecure connectivity—security risk.
No metadata repository—limited reuse.
Program-driven installation and management.
Sqoop2
Sqoop2 is the next generation of data transfer architecture that is designed to solve the limitations of
Sqoop1, namely:
Sqoop2 has a web-enabled user interface (UI).
Sqoop2 will be driven by a Sqoop server architecture.
Sqoop2 will provide greater connector flexibility; apart from JDBC, many native connectivity
options can be customized by providers.
Sqoop2 will have a Representational State Transfer (REST) API interface.
Sqoop2 will have its own metadata store.
Sqoop2 will add credentials management capabilities, which will provide trusted connection
capabilities.
The proposed architecture of Sqoop2 is shown in Figure 4.21 . For more information on Sqoop sta-
tus and issues, please see the Apache Foundation website.
Hadoop summary
In summary, as we see from this section and the discussion on Hadoop and its ecosystem of technolo-
gies, there are a lot of processing capabilities in this framework to manage, compute, and store large
Search WWH ::




Custom Search