Database Reference
In-Depth Information
Therefore, writing to Hadoop from Integration Services is best
accomplished by writing a file out to the file system and then moving it into
the Hadoop file system.
The next sections cover these operations in detail, so that you can configure
yourSSISpackagestobothretrievedatafromHadoopandmovedataintoit.
The instructions assume the use of the Hortonworks Hadoop distribution.
Hadoop can be installed on the same computer where you have SSIS
installed. However, in a production environment, these will likely be on two
different machines. This does present a few additional constraints because
SSIS cannot currently interact directly with Hadoop, and without a local
installation of Hadoop, it cannot access the Hadoop tools. The approaches
to work around these constraints are covered next.
NOTE
At the time of this writing, SSIS does not support direct connectivity to
the Hadoop file system. However, Microsoft has announced that they
are working on a number of additional tasks and components for SSIS
to allow for better interaction with Hadoop. These components are
planned for release on CodePlex ( http://www.codeplex.com ) as open
source.
There are a few options that require a little more setup and some
custom coding in SSIS. The Win HDFS Managed Library
( http://blogs.msdn.com/b/carlnol/archive/2013/02/08/
hdinsight-net-hdfs-file-access.aspx ) allows you to access the HDFS file
system from a script task or component in SSIS. You can also use FTPS
(not supported natively in SSIS; it will require a custom task) to upload
files to HDFS on the Windows distribution. Finally, you can use
WebHDFS, which is a REST API. This can be accessed from a custom
script in SSIS.
Connecting to Hive
First, to use the Hive installation, you need to confirm that Hive is
configured properly for access over the network. You need to check several
things to verify this. First, ensure that the Hive service is started. Next,
Search WWH ::




Custom Search