Databases Reference
In-Depth Information
2.10. Encoding NULL Values
Problem
Sqoop encodes database NULL values using the null string constant. Your downstream
processing (Hive queries, custom MapReduce job, or Pig script) uses a different constant
for encoding missing values. You would like to override the default one.
Solution
You can override the NULL substitution string with the --null-string and --null-non-
string parameters to any arbitrary value. For example, use the following command to
override it to \N :
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--null-string '\\N' \
--null-non-string '\\N'
Discussion
Almost all relational databases allow you to define columns as optional, allowing the
columns to be associated with no value. This missing information is usually referred to
as the NULL value. For example, a BOOL column generally contains only two distinct
values: true and false . Additionally, the column can contain the value NULL if during
table creation you've explicitly allowed that column to be optional. Databases usually
store the NULL value as an extra bit in addition to the column's usual data. With Sqoop
supporting the import of data to formats that don't natively support the NULL value (e.g.,
CSV file), there is a need to encode the missing value into the data itself. By default,
Sqoop uses the string constant null (lowercased) for representing the missing value.
This default constant might not suit your needs if your data can contain the same string
constant as a regular value or if your downstream processing is expecting a different
substitution string constant.
To allow easier integration with additional Hadoop ecosystem components, Sqoop dis‐
tinguishes between two different cases when dealing with missing values. For text-based
columns that are defined with type VARCHAR , CHAR , NCHAR , TEXT , and a few others, you
can override the default substitution string using the parameter --null-string . For all
other column types, you can override the substitution string with the --null-non-
string parameter. Some of the connectors might not support different substitution
strings for different column types and thus might require you to specify the same value
in both parameters.
Search WWH ::




Custom Search