Effective Big Data ETL with SSIS, Pig, and Sqoop - Microsoft Big Data Solutions

Database Reference

In-Depth Information

• Executable: C:\Sysinternals\PsExec.exe

• Arguments:

\\Your_Hadoop_ServerC:\hdp\hadoop\hadoop-1.2.0.1.3.0.0-0380\bin\hadoop.cmd

dfs -put

\\CommonNetworkLocation\LandingZone\Customer1.txt

/user/MsBigData/Customer1.txt

The Execute Process task can be configured to use expressions to make this

process more dynamic. In addition, if you are moving multiple files, it can

be used inside a For Each loop in SSIS to repeat the process a specified

number of times.

Getting the Best Performance from SSIS

As touched on earlier, one way to improve SSIS performance with big data

is to minimize the amount of data that SSIS actually has to process. When

querying from Hive, always minimize the number of rows and columns you

are retrieving to the essential ones.

Another way of improving performance in SSIS is by increasing the parallel

activity. This has the most benefit when you are writing to Hadoop. If you

set up multiple, parallel data flows, all producing data files, you can invoke

multiple dfs -put commands simultaneously to move the data files into

the Hadoop file system. This takes advantage of the Hadoop capability to

scale out across multiple nodes.

Increasing parallelism for packages reading from Hive can have mixed

results. You get a certain amount of parallelism when you query from Hive

in the first place because it spreads the processing out across the cluster.

You can attempt to run multiple queries using different ODBC source

components in SSIS simultaneously, but generally it works better to issue a

single query and let Hive determine how much parallelism to use.

SSIS is a good way to interact with Hadoop, particularly for querying

information.It'salsoafamiliartooltothoseintheSQLServerspace.Thanks

to the number of sources and destinations it supports, it can prove very

useful when integrating your big data with the rest of your organization.

Search WWH ::

Custom Search

Home