Databases Reference
In-Depth Information
Data denormalization
Denormalization is, of course, the reverse process and is analogous to a "group by"
in SQL. For the next exercise, create a denormalization job which reverses the data
flows we built in the previous exercise. Use the denormalize.csv file (from the
resources directory of this chapter) as the input data. You will need delimited input,
denormalize, and delimited output components.
For comparison, there is a denormalize job in the job directory of this chapter.
Extracting delimited fields
As we have seen, some systems may store data in a denormalized form and, in the
previous section, we saw how we could normalize the data. In essence, we were
turning the data from column into a row. However, with some data, we may wish
to change its normalized form not to rows, but to individual columns. For example,
suppose a system stores its employee data with the following schema:
[employee_id] | [name]
And the name field holds the first name and last name of the employee in the
following format:
[last_name], [first_name]
An example file is shown as follows:
Note that the schema does not have three fields, but that the second
field contains the first and last name, separated by a comma.
 
Search WWH ::




Custom Search