Database Reference
In-Depth Information
WARNING
The use of expressions in SSIS packages can be done in any version of
SSIS. However, you want to be aware of an important pre-SSIS 2012
restriction. In earlier versions, expressions could not interact with any
string longer than 4,000 characters, and couldn't return a result longer
than 4,000 characters. When creating long SQL statements using
expressions, it is possible to exceed this limit. Unfortunately, in this
case, the only solution is to reduce the length of your SQL. Fortunately,
SSIS 2012 has removed that restriction, making it much easier to work
with if you need to use long SQL queries.
After your Hive data has been retrieved into SSIS, you can use it as you
would any other data source, applying any of the SSIS transformations to it,
and send it to any destination supported by SSIS. However, data retrieved
from Hive often contains null values, so it can be worthwhile to make sure
that your SSIS package handles null values appropriately.
NOTE
Hive tends to return null values because it applies schema on read
rather than on write. That is, if Hive reads a value from one of the
underlying files that makes up a table, and that value doesn't match the
expected data type of the column, or doesn't have appropriate
delimiters, Hive will return a null value for the column. This is good in
that it doesn't stop your query from processing, but it can result in
some strange behavior in SSIS if you don't have appropriate null
handling in your package.
Loading Data into Hadoop
As noted earlier, SSIS cannot write to Hive directly using ODBC. The
alternative is to create a file with the appropriate file format and copy it
directly to the Hadoop file system. If it is copied to the directory that Hive
uses for managing the table, your data will show up in the Hive table.
Search WWH ::




Custom Search