Database Reference
In-Depth Information
Transferring Data with Sqoop
Sqoop is a tool designed to import and export data from Hadoop systems
to other data stores, particularly relational databases. This can prove very
useful for easily moving data from a SQL Server database into Hadoop or
for retrieving data from Hadoop and storing it in SQL Server. Sqoop uses
MapReduce to do the actual data processing, so it takes full advantage of the
parallel processing capabilities of Hadoop.
One of the reasons that Sqoop is easy to use is that it infers the schema from
the relational data store that it is interacting with. Because of this, you don't
have to specify a lot of information to use it. Instead, it determines column
names, types, and formats from the relational definition of the table.
Behind the scenes, Sqoop is creating logic to read and write the relational
data through generated code classes. This means that most operations are
performed on a row-by-row basis, so it may not deliver the most optimal
performance. Certain databases, like MySQL, do have options to use bulk
interfaces with Sqoop, but currently, SQL Server does not.
Sqoop uses Java Database Connectivity components to make connections
to relational databases. These components need to be installed on the
computer where Sqoop is run. Microsoft provides a JDBC driver archive for
SQL Server at http://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx .
After downloading the archive, you need to extract the appropriate .jar
file in your Sqoop lib directory (on a Hortonworks default installation,
C:\hdp\hadoop\sqoop-1.4.3.1.3.0.0-0380\lib ) so that Sqoop
can locate the driver.
Copying Data from SQL Server
To move data from SQL Server to Hadoop, you use the sqoop -import
command. A full example is shown here:
sqoop import --connect
"jdbc:sqlserver://Your_SqlServer;database=MsBigData;
Username=demo;Password=your_password;"
--table Customers
--m 1
--target-dir /MsBigData/Customers
Search WWH ::




Custom Search