Administering Core Business Intelligence Services - Microsoft SQL Server 2008 R2 Administration

Database Reference

In-Depth Information

The Fuzzy Lookup transformation follows a similar transformation process in a package data

flow—it tries to find an exact match. If the search fails, then it provides a close match from the

reference table. This search process uses an equi-join method to locate matching records.

The Fuzzy Grouping transformation requires the connection to the SQL Server for creating

temporary tables within TEMPDB database-that transforms required algorithm.

The Fuzzy components lookup and grouping will transform the data from the reference table

and add an additional column for storing the tokens for each row. Fuzzy Lookup uses an

Error-Tolerant-Index ( ETI ) to find the matching rows in the reference table. The ETI

options will provide a way to store the index table for later updates based on the changes that

are made to the reference data. To obtain optimum performance, ETL provides further options

such as:

F Generates New Index: It is best used when the reference data is small, for quicker

data transformation

F Stores New Index: It is best used with existing index; and to generate the index, the

package must be executed

At runtime, the Fuzzy Grouping transform adds three key fields to the output stream:

F _key_in : It is used to uniquely identify each row based on the reference column

generated key

F _key_out : It is a column that identifies a group of duplicate rows

F _score : It is the similarity between the _key_in and _key_out values where the

value between 0 and 1 indicates the similarity of the input row.

The fuzzy lookup transform is useful to find correct values based on similarity and parameters

matching during a lookup and the fuzzy grouping is useful to find duplicated values. Both

these are very useful to eliminate any administrative/human errors.

Search WWH ::

Custom Search

Home