Database Reference
In-Depth Information
Figure 11-20. Repository view of Talend-Hive database connection
At this point, I can create a range of reports based on the underlying Hive table data.
Generating Reports
By using the Splunk/Hunk product at the start of this chapter, I was able to quickly create some reports and develop a
dashboard based on HDFS data. When I create Talend reports based on Hive table data, I can start to think about the
quality of the data that's residing on HDFS and Hive.
As you remember from Chapter 9, you can create Hive external tables to represent HDFS data. In this section,
I create reports that represent the column data in the Hive rawtrans table of the trade information database. The
content of the data in that table is not relevant; it is the functionality of the Talend data-quality reports that I
concentrate on here.
To create the reports that this section will use, I first need to create two rules for data quality under Libraries, then
Rules, then SQL in the Repository pane, and one regular expression pattern by going to Libraries, then Patterns, then
Regex, then Date. The regular expression rule for date is copied from a similar pre-existing rule in the same location,
called date MM DD YYYY. I simply right-click it and select duplicate.
 
Search WWH ::




Custom Search