Effective Big Data ETL with SSIS, Pig, and Sqoop - Microsoft Big Data Solutions

Database Reference

In-Depth Information

on your SSIS systems. Using it along with SSIS can deliver the best of

both worlds: a solution that scales with Hadoop and that has the extensive

integration capabilities of SSIS. In addition, if the data doesn't need to leave

Hadoop storage, Pig is a natural fit.

Use Cases for Sqoop

Sqoop proves most useful in the following cases:

• There is little need to transform the data being moved between SQL

Server and Hadoop.

• The IT staff isn't comfortable with SSIS or Pig.

• Ease of use is a higher priority than performance.

• Your Hadoop data is stored in standard Hadoop binary file formats.

Sqoopprimarilycomesintoplayforeithersimpletablereplicationscenarios

or for one-time data import and export from Hadoop. Because of the

reduced control over transformations and lack of fine-grained tuning

capability, it generally doesn't work as well in production-level data

integration unless the integration is limited to replicating tables.

Summary

This chapter reviewed multiple methods of integrating your existing SQL

Server environment with your big data environment, along with the pros

and cons of each. SSIS was discussed, along with how to set it up for

communication with Hive via ODBC and how to get the best performance

from it. Sqoop was also covered, as a useful tool for handling bulk data

import and export from Hadoop. A third option, Pig, was discussed, with

a description of how you can leverage it to take advantage of Hadoop

scalability and how it can be part of an SSIS solution to create a better

solution overall. The chapter concluded by looking at when each tool is most

applicable.

Search WWH ::

Custom Search

Home