Effective Big Data ETL with SSIS, Pig, and Sqoop - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Therefore, writing to Hadoop from Integration Services is best

accomplished by writing a file out to the file system and then moving it into

the Hadoop file system.

The next sections cover these operations in detail, so that you can configure

yourSSISpackagestobothretrievedatafromHadoopandmovedataintoit.

The instructions assume the use of the Hortonworks Hadoop distribution.

Hadoop can be installed on the same computer where you have SSIS

installed. However, in a production environment, these will likely be on two

different machines. This does present a few additional constraints because

SSIS cannot currently interact directly with Hadoop, and without a local

installation of Hadoop, it cannot access the Hadoop tools. The approaches

to work around these constraints are covered next.

NOTE

At the time of this writing, SSIS does not support direct connectivity to

the Hadoop file system. However, Microsoft has announced that they

are working on a number of additional tasks and components for SSIS

to allow for better interaction with Hadoop. These components are

planned for release on CodePlex ( http://www.codeplex.com ) as open

source.

There are a few options that require a little more setup and some

custom coding in SSIS. The Win HDFS Managed Library

hdinsight-net-hdfs-file-access.aspx ) allows you to access the HDFS file

system from a script task or component in SSIS. You can also use FTPS

(not supported natively in SSIS; it will require a custom task) to upload

files to HDFS on the Windows distribution. Finally, you can use

WebHDFS, which is a REST API. This can be accessed from a custom

script in SSIS.

Connecting to Hive

First, to use the Hive installation, you need to confirm that Hive is

configured properly for access over the network. You need to check several

things to verify this. First, ensure that the Hive service is started. Next,

Search WWH ::

Custom Search

Home