Database Reference
In-Depth Information
This issue was caused by my not installing all third-party libraries when being prompted to do so. The solution is
simple: just click Finish and accept the licensing, then patiently wait for the libraries to install.
Two errors were caused by my setting up the Hive connection incorrectly. Specifically, I received
the following error:Failed to run analysis: rawtrans_analysys
Error message:
Error while processing statement: Failed: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
and the following error was in the Hive log file /var/log/hive/hadoop-cmf-hive-HIVEMETASTORE-hc2nn.semtech-
solutions.co.nz.log.out:
assuming we are not on mysql: ERROR: syntax error at or near "@@"
The port number should have been set to 10000 for the hiveserver2 address. I used the value 9083, which was
the port value defined in the property hive.metatstore.uris in the file hive-site.xml under the directory /etc/hive/
conf.cloudera.hive.
There was the following error regarding an RPM component:There was an error creating the RPM file:
Could not find valid RPM application:
RPM-building tools are not available on the system
The error occurred because an RPM build component was missing from the Centos Linux host on which Talend
was installed. The solution was to install the component using the yum command install .
Finally, this short error occurred while I was installing the Talend client software and it implied that the Talend
install file called “dist” was corrupted:
Unable to execute validation program
I don't know how it happened, but I solved the problem by removing the Talend software release directory and
extracting the tar archive a second time.
Summary
Relational database systems encounter data-quality problems, and they use data-quality rules to solve those
problems. Hadoop Hive has the potential to hold an extremely large amount of data—a great deal larger than
traditional relational database systems and at a lower unit cost. As the data volume rises, however, so does the
potential for encountering data-quality issues.
Tools like Talend and the reports that it can produce offer the ability to connect to Hive and, via external tables,
to HDFS-based data. Talend can run user-defined data quality checks against that Hive data. The examples presented
here offer a small taste of the functionality that is available. Likewise, Splunk/Hunk has the potential for generating
reports and creating dashboards to monitor data. After working through the Splunk/Hunk and Talend application
examples provided in this chapter, you might consider investigating the Tableau and Pentaho applications for big data
as well.
You now have the tools to begin creating your own Hadoop-based systems. As you go forward, remember to
check the Apache and tool supplier websites. Consult their forums and ask questions if you encounter problems. As
you find your own solutions, post them as well, so as to help other members of the Hadoop community.
 
Search WWH ::




Custom Search