Database Reference
In-Depth Information
For instance, Figure 11-22 tests the size of the trade.rawtrans table “amount” column by checking that no
transaction values exceed 10,000 English pounds sterling. The text in the Where clause field is added to the Hive QL
that is generated by Talend and is run against Hive to create the report's content. In this example, I have not amended
other fields except to add the rule's name, purpose, and description.
Figure 11-23 is a similar SQL-based data-quality rule, except that in its Where clause field it uses the Hive QL
length function to check the length of the rawtrans table's Supplier field data in the trade database. Of course, this field
is a string, so this check ensures that a maximum length for the supplier name is adhered to. Checks like this are useful
during data-migration exercises.
Figure 11-23. SQL data-quality rule for “Supplier” column
Now that the rules have been created, it is possible to create some Talend reports on data quality in the trade.
rawtrans Hive table. I concentrate on column-based reports and create single- and multi-column reports, as well as
reports based on SQL and regular expressions. (Note: you can create new reports by right-clicking the Analysis folder
in the Repository pane and selecting the New Analysis option.) Each type of report has the set of control buttons
shown in Figure 11-24 .
Figure 11-24. Control buttons for reports
The first two buttons shown in Figure 11-24 contract and expand the report display to either hide or show charts
and/or details. The third button saves any changes, while the fourth is used to run the report. The final “eye” icon
refreshes the report charts.
 
Search WWH ::




Custom Search