Database Reference
In-Depth Information
Generated Code
In addition to writing the contents of the database table to HDFS, Sqoop also provides you
with a generated Java source file ( widgets.java ) written to the current local directory. (After
running the sqoop import command shown earlier, you can see this file by running ls
widgets.java .)
As you'll learn in Imports: A Deeper Look , Sqoop can use generated code to handle the
deserialization of table-specific data from the database source before writing it to HDFS.
The generated class ( widgets ) is capable of holding a single record retrieved from the
imported table. It can manipulate such a record in MapReduce or store it in a
SequenceFile in HDFS. ( SequenceFile s written by Sqoop during the import pro-
cess will store each imported row in the “value” element of the SequenceFile 's key-
value pair format, using the generated class.)
It is likely that you don't want to name your generated class widgets , since each instance
of the class refers to only a single record. We can use a different Sqoop tool to generate
source code without performing an import; this generated code will still examine the data-
base table to determine the appropriate data types for each field:
% sqoop codegen --connect jdbc:mysql://localhost/hadoopguide \
> --table widgets --class-name Widget
The codegen tool simply generates code; it does not perform the full import. We speci-
fied that we'd like it to generate a class named Widget ; this will be written to Wid-
get.java . We also could have specified --class-name and other code-generation argu-
ments during the import process we performed earlier. This tool can be used to regenerate
code if you accidentally remove the source file, or generate code with different settings
than were used during the import.
If you're working with records imported to SequenceFile s, it is inevitable that you'll
need to use the generated classes (to deserialize data from the SequenceFile storage).
You can work with text-file-based records without using generated code, but as we'll see in
Working with Imported Data , Sqoop's generated code can handle some tedious aspects of
data processing for you.
Additional Serialization Systems
Recent versions of Sqoop support Avro-based serialization and schema generation as well
(see Chapter 12 ), allowing you to use Sqoop in your project without integrating with gener-
ated code.
Search WWH ::




Custom Search