Filtering, Sorting, and Other Processing Techniques - Getting Started with Talend Open Studio for Data Integration

Databases Reference

In-Depth Information

Data denormalization

Denormalization is, of course, the reverse process and is analogous to a "group by"

in SQL. For the next exercise, create a denormalization job which reverses the data

flows we built in the previous exercise. Use the denormalize.csv file (from the

resources directory of this chapter) as the input data. You will need delimited input,

denormalize, and delimited output components.

For comparison, there is a denormalize job in the job directory of this chapter.

Extracting delimited fields

As we have seen, some systems may store data in a denormalized form and, in the

previous section, we saw how we could normalize the data. In essence, we were

turning the data from column into a row. However, with some data, we may wish

to change its normalized form not to rows, but to individual columns. For example,

suppose a system stores its employee data with the following schema:

[employee_id] | [name]

And the name field holds the first name and last name of the employee in the

following format:

[last_name], [first_name]

An example file is shown as follows:

Note that the schema does not have three fields, but that the second

field contains the first and last name, separated by a comma.

Search WWH ::

Custom Search

Home