Database Reference
In-Depth Information
Chapter 8
Effective Big Data ETL with SSIS, Pig, and
Sqoop
What You Will Learn in This Chapter:
• Moving Data Between SQL Server and Hadoop
• Using SSIS to Integrate
• Using Sqoop for Importing and Exporting
• Using Pig to Transform Data
• Choosing the Right Tool
A number of tools are available to help you move data between your Hadoop
environment and SQL Server. This chapter covers three common ones: SQL
Server Integration Services, Sqoop, and Pig.
SQL Server Integration Services (SSIS) is used in many SQL Server
environments to import, export, and transform data. It can integrate with
many different data systems, not just SQL Server, and supports a number
of built-in transformations. In addition, you can extend it using custom
transformations to support any transformations not supported “out of the
box.” This extensibility enables it to work with Hive as both a source of data
and as a destination.
Sqoop is a tool designed to handle moving data between Hadoop and
relational databases. Although it doesn't support a full range of
transformation capabilities like SSIS, it is easy and quick to set up and use.
Pig enables users to analyze large data sets. It supports a number of built-in
transformations for the data, and additional transformations can be added as
user-defined functions through custom coding. It was originally developed as
a way to reduce the complexity of writing MapReduce jobs, but it has evolved
into a fully featured transformation tool for Hadoop data.
Because each of these tools has strengths and weaknesses, the final part of
this chapter focuses on helping you decide which tool is the best choice for
different scenarios you may encounter when moving your data.
Search WWH ::




Custom Search