Database Reference
In-Depth Information
Installing Open Studio for Big Data
You can find Open Studio, as well as a number of other big data offerings, on Talend's website at www.talend.com ,
including a big data sand box and big data Studio and Enterprise editions. For the chapter's example, I use the free,
30-day trial version downloaded from the Talend website rather than the sandbox version. This is because I plan to
connect Talend to my existing Hadoop cluster and I will tackle any problems as they arise. However, you may find the
sand box version useful because it contains sample code and a fully working Hadoop cluster. Also, I create a Pig-based
Map Reduce job because the full Java-based Map Reduce functionality is available only in the Enterprise product.
When I attempt to install these big data ETL tools, I always try to install them on Windows machines first, as I
hope to use them as clients connecting to my Linux-based Hadoop clusters. A shell-based error prevented me from
doing this at this time, so instead I install the Talend software on the Centos 6 Linux host hc1nn and I configure it to
connect to the CDH5 Hadoop cluster on nc2nn. (See the “Potential Errors” section for details on this error, which calls
for a fix to be added to future Cloudera releases.)
For this installation, I download the Talend Open Studio for Big Data 5.5 from the URL www.talend.com/
download . I select the Big Data tab and download the Open Studio software, as shown in Figure 10-24 . (I added red
indicator boxes to the options that I need.) The download took an hour for me; the length of download time depends
on your bandwidth.
Figure 10-24. Software download for Talend
I place the software in a directory called talend in the Linux hadoop user account's home directory, using the
Linux pwd command:
[hadoop@hc1nn talend]$ pwd
/home/hadoop/talend
 
Search WWH ::




Custom Search