Databases Reference
In-Depth Information
6.8. Using the Correct NULL String in Hive
Problem
You've imported your data into Hive using Sqoop and now you're trying to query it.
However, you can see that some columns contain the correct NULL value but some con‐
tain the string literal null and are not selected using the expression column IS NULL .
Solution
Due to differences in the default NULL substitution string between Sqoop and Hive, you
have to override the Sqoop default substitution strings to be compatible with Hive. For
example:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--hive-import \
--null-string '\\N' \
--null-non-string '\\N'
Discussion
Hive, by default, expects that the NULL value will be encoded using the string constant
\N . Sqoop, by default, encodes it using the string constant null . To rectify the mismatch,
you'll need to override Sqoop's default behavior with Hive's.
This issue with different NULL substitution strings is very tricky to debug. Depending
on the way Hive parses data, it seems to be working sometimes and at other times does
not. When Hive is not able to parse certain columns, it will return NULL instead of
throwing an exception and failing the entire query execution. Let's investigate this a bit
further with an example. Consider the following table:
CREATE TABLE tbl ( id int , txt varchar ( 50 ));
INSERT INTO tbl VALUES ( NULL , NULL );
The table tbl has two columns, one numeric and one text, with a row that has NULL
stored in both columns. Without any special parameters, Sqoop will import this table
as a file with a single row null,null .
When Hive reads this line, it will first separate each column using the comma as a
separator. Subsequently, it will start processing the value for each column. The first
column is of type int and contains the string constant null , which is not a valid number
value. Instead of throwing a parsing exception at this point, Hive will substitute NULL
for this cell. The second column is of type string and contains the string constant
Search WWH ::




Custom Search