Database Reference
In-Depth Information
on your SSIS systems. Using it along with SSIS can deliver the best of
both worlds: a solution that scales with Hadoop and that has the extensive
integration capabilities of SSIS. In addition, if the data doesn't need to leave
Hadoop storage, Pig is a natural fit.
Use Cases for Sqoop
Sqoop proves most useful in the following cases:
• There is little need to transform the data being moved between SQL
Server and Hadoop.
• The IT staff isn't comfortable with SSIS or Pig.
• Ease of use is a higher priority than performance.
• Your Hadoop data is stored in standard Hadoop binary file formats.
Sqoopprimarilycomesintoplayforeithersimpletablereplicationscenarios
or for one-time data import and export from Hadoop. Because of the
reduced control over transformations and lack of fine-grained tuning
capability, it generally doesn't work as well in production-level data
integration unless the integration is limited to replicating tables.
Summary
This chapter reviewed multiple methods of integrating your existing SQL
Server environment with your big data environment, along with the pros
and cons of each. SSIS was discussed, along with how to set it up for
communication with Hive via ODBC and how to get the best performance
from it. Sqoop was also covered, as a useful tool for handling bulk data
import and export from Hadoop. A third option, Pig, was discussed, with
a description of how you can leverage it to take advantage of Hadoop
scalability and how it can be part of an SSIS solution to create a better
solution overall. The chapter concluded by looking at when each tool is most
applicable.
Search WWH ::




Custom Search