Database Reference
In-Depth Information
If the wrong type is specified for a key field, there will be an error message generated similar to the following:
{"type":"TASK_FAILED","event":{"org.apache.hadoop.mapreduce.jobhistory.TaskFailed":{"taskid":"task_1
412385899407_0008_m_000000","taskType":"MAP","finishTime":1412403861583,"error":",
Error: java.io.IOException: org.pentaho.hadoop.mapreduce.converter.TypeConversionException: \n
Error converting to Long: 1995,ACURA,INTEGRA,SUBCOMPACT,1.8,4,A4,X,10.2,7,28,40,1760,202\n
For input string: \"1995,ACURA,INTEGRA,SUBCOMPACT,1.8,4,A4,X,10.2,7,28,40,1760,202\"\n\n
In this case, a string key was incorrectly being treated as a value.
An error in the configuration of the PDI Map Reduce job can cause the following error message:
commons.vfs.FileNotFoundException: Could not read from
"file:///yarn/nm/usercache/mikejf12/appcache/application_1412471201309_0001/
container_1412471201309_0001_01_000013/job.jar"
/yarn/nm/usercache/mikejf12/appcache/application_1412471201309_0001/
container_1412471201309_0001_01_000001
because it is a not a file.
Although it looks like some kind of Hadoop configuration error, it is not. It was again caused by setting the wrong
data type on Map Reduce variable values. Just follow the example installation and configuration in this section and
you will be fine.
Finally, a lack of available memory on the Hadoop Resource Manager host Linux machine produces an error like
the following:
2014-10-07 18:08:57,674 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.
rm.RMContainerAllocator:
Reduce slow start threshold not met. completedMapsForReduceSlowstart 1
To resolve a problem like this, try reducing the Resource Manager memory usage in the CDH Manager so that it
does not exceed that available.
Now that you understand how to develop a Map Reduce job using Pentaho, let's see how to create a similar job
using Talend Open Studio. The illustrative example uses the same Hadoop CDH5 cluster as a data source and for
processing.
Talend Open Studio
Talend offers a popular big data visual ETL tool called Open Studio. Like Pentaho, Talend gives you the ability to create
Map Reduce jobs against existing Hadoop clusters in a logical, step-by-step manner by pulling pre-defined modules
from a palette and linking them in an ETL chain to create a Map Reduce based job. I describe how to source, install,
and use Open Studio, as well as to create a Pig-based Map Reduce job. Along the way, I point out a few common errors
and their solutions.
 
Search WWH ::




Custom Search