Database Reference
In-Depth Information
Choosing the Right Tool
You have a variety of options for integrating your SQL Server and Hadoop
environments. As with any set of tools, each option has pros and cons too,
and the best one to use will vary depending on the use case and specific
requirements that you have. In addition, existing skillsets can impact tool
choice. It is worthwhile being familiar with the different strengths of each
tool, because some scenarios may be much easier to accomplish in one tool
or another.
The following sections lay out some of the advantages and disadvantages of
each tool, as well as scenarios where you might want to use them.
Use Cases for SSIS
SSIS works well in cases where you have the following:
• Staff trained on SSIS
• The need to do additional transformations on the data after reading it
from Hive or prior to writing it to Hadoop
• Performance tuning is important
SSIS is the best fit for shops that are already invested in SSIS and that
need to incorporate Hadoop data into existing data-integration processes.
However, SSIS does not have an inherent ability to scale out. In cases where
there is significant data processing to be done, the best results can come
from a hybrid solution leveraging both SSIS and Pig. SSIS delivers the
integration with other data sources and destinations, and Pig delivers the
ability to scale transformation of data across a Hadoop cluster.
Use Cases for Pig
Pig is best used in the following cases:
• The amount of data to be processed is too much to be handled in SSIS.
• You need to take advantage of the scalability of Hadoop.
• Your IT staff is comfortable learning a new language and tool.
• Your Hadoop data is stored in standard Hadoop binary file formats.
Pig proves quite useful when you need the data transformation to happen
on your Hadoop cluster so that the process scales and conserves resources
Search WWH ::




Custom Search