Database Reference
In-Depth Information
data from your choice of sources. Any transformations that need to be
applied to the data can be performed. As the last step of the data flow, the
data needs to be written to a file. The format of the file is determined by
what the Hive system expects. The easiest format to work with from SSIS
is a delimited format, with carriage return / line feeds delimiting rows, and
a column delimiter like a comma (,) or vertical bar (|) separating column
values. The SSIS Flat File Destination is designed to write these types of
files.
NOTE
The default Hive column delimiter for flat files is Ctrl-A (0x001).
Unfortunately, this isn't supported for use from SSIS. If at all possible,
use a column delimiter that SSIS supports. If you must use a
non-standard column delimiter, you will need to add a post-processing
step to your package to translate the column delimiters after the file is
produced.
NOTE
If Hive is expecting another format (see Chapter 6 for some of the
possibilities), you might need to implement a custom destination using
a script component. Although a full description of this is beyond the
scope of this chapter, a custom destination lets you fully control the
format of the file produced, so you can match anything that Hive is
expecting.
Once the file is produced, you can use a file system task to copy it to
a network location that is accessible to both your SSIS server and your
Hadoop cluster. The next step is to call the process to copy the file into the
HDFS. This is done through an Execute Process task. Assuming that you are
executing the Hadoop copy on a remote system using PsExec, you configure
the task with the following property settings. (You might need to adjust your
file locations):
Search WWH ::




Custom Search