Database Reference
In-Depth Information
The Fuzzy Lookup transformation follows a similar transformation process in a package data
flow—it tries to find an exact match. If the search fails, then it provides a close match from the
reference table. This search process uses an equi-join method to locate matching records.
The Fuzzy Grouping transformation requires the connection to the SQL Server for creating
temporary tables within TEMPDB database-that transforms required algorithm.
The Fuzzy components lookup and grouping will transform the data from the reference table
and add an additional column for storing the tokens for each row. Fuzzy Lookup uses an
Error-Tolerant-Index ( ETI ) to find the matching rows in the reference table. The ETI
options will provide a way to store the index table for later updates based on the changes that
are made to the reference data. To obtain optimum performance, ETL provides further options
such as:
F Generates New Index: It is best used when the reference data is small, for quicker
data transformation
F Stores New Index: It is best used with existing index; and to generate the index, the
package must be executed
At runtime, the Fuzzy Grouping transform adds three key fields to the output stream:
F _key_in : It is used to uniquely identify each row based on the reference column
generated key
F _key_out : It is a column that identifies a group of duplicate rows
F _score : It is the similarity between the _key_in and _key_out values where the
value between 0 and 1 indicates the similarity of the input row.
The fuzzy lookup transform is useful to find correct values based on similarity and parameters
matching during a lookup and the fuzzy grouping is useful to find duplicated values. Both
these are very useful to eliminate any administrative/human errors.
 
Search WWH ::




Custom Search