Introducing Big Data Technologies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

This command will generate a series of tasks:

●

Generate SQL code.

●

Execute SQL code.

●

Generate MapReduces jobs.

●

Execute MapReduce jobs.

●

Transfer data to local files or HDFS.

Export syntax - sqoop export --connect jdbc:mysql://localhost/testdb \ --table

CLIENTS_INTG --username test --password **** \ --export-dir /user/localadmin/CLIENTS

This command will generate a series of tasks:

● Generate MapReduce jobs.

● Execute MapReduce jobs.

● Transfer data from local files or HDFS.

● Compile SQL code.

● Create or insert into CLIENTS_INTG table.

There are many features of Sqoop1 that are easy to learn and implement, for example, on the com-

mand line you can specify if the import is directly to Hive, HDFS, or HBase. There are direct connec-

tors to the most popular databases: Oracle, SQL Server, MySQL, Teradata, and PostGres.

There are evolving challenges with Sqoop1, including:

●

Cryptic command-line arguments.

●

Nonsecure connectivity—security risk.

●

No metadata repository—limited reuse.

●

Program-driven installation and management.

Sqoop2

Sqoop2 is the next generation of data transfer architecture that is designed to solve the limitations of

Sqoop1, namely:

● Sqoop2 has a web-enabled user interface (UI).

● Sqoop2 will be driven by a Sqoop server architecture.

● Sqoop2 will provide greater connector flexibility; apart from JDBC, many native connectivity

options can be customized by providers.

● Sqoop2 will have a Representational State Transfer (REST) API interface.

● Sqoop2 will have its own metadata store.

● Sqoop2 will add credentials management capabilities, which will provide trusted connection

capabilities.

The proposed architecture of Sqoop2 is shown in Figure 4.21 . For more information on Sqoop sta-

tus and issues, please see the Apache Foundation website.

Hadoop summary

In summary, as we see from this section and the discussion on Hadoop and its ecosystem of technolo-

gies, there are a lot of processing capabilities in this framework to manage, compute, and store large

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home