An Overview of Data Mining Techniques - Data Mining Techniques in CRM: Inside Customer Segmentation

Database Reference

In-Depth Information

Record screening models can also play another important role. They can be

used as a data exploration tool before the development of another data mining

model. Some models, especially those with a statistical origin, can be affected by

the presence of abnormal cases which may lead to poor or biased solutions. It is

always a good idea to identify these cases in advance and thoroughly examine them

before deciding on their inclusion in subsequent analysis.

Modified standard data mining techniques, like clustering models, can be

used for the unsupervised detection of anomalies. Outliers can often be found

among cases that do not fit well in any of the emerged clusters or in sparsely

populated clusters. Thus, the usual tactic for uncovering anomalous records is to

develop an explorative clustering solution and then further investigate the results.

A specialized technique in the field of unsupervised record screening is IBM SPSS

Modeler's Anomaly Detection. It is an exploratory technique based on clustering.

It provides a quick, preliminary data investigation and suggests a list of records with

odd data patterns for further investigation. It evaluates each record's ''normality''

in a multivariate context and not on a per-field base by assessing all the inputs

together. More specifically, it identifies peculiar cases by deriving a cluster solution

and then measuring the distance of each record from its cluster central point,

the centroid. An anomaly index is then calculated that represents the proximity of

each record to the other records in its cluster. Records can be sorted according

to this measure and then flagged as anomalous according to a user-specified

threshold value. What is interesting about this algorithm is that it provides the

reasoning behind its results. For each anomalous case it displays the fields with the

unexpected values that do not conform to the general profile of the record.

Supervised and Unsupervised Models for Detecting Fraud

Unsupervised record screening techniques can be applied for fraud detection

even in the absence of recorded fraudulent events. If past fraudulent cases

are available, analysts can try a supervised classification model to identify

the input data patterns associated with the target suspicious activities. The

supervised approach has strengths since it works in a more targeted way than

unsupervised record screening. However, it also has specific disadvantages.

Since fraudsters' behaviors may change and evolve over time, supervised

models trained on past cases may soon become outdated and fail to capture

new tricks and new types of suspicious patterns. Additionally, the list of

past fraudulent cases, which is necessary for the training of the classification

model, is often biased and partial. It depends on the specific rules and criteria

Search WWH ::

Custom Search

Home