Database Reference
In-Depth Information
Record screening models can also play another important role. They can be
used as a data exploration tool before the development of another data mining
model. Some models, especially those with a statistical origin, can be affected by
the presence of abnormal cases which may lead to poor or biased solutions. It is
always a good idea to identify these cases in advance and thoroughly examine them
before deciding on their inclusion in subsequent analysis.
Modified standard data mining techniques, like clustering models, can be
used for the unsupervised detection of anomalies. Outliers can often be found
among cases that do not fit well in any of the emerged clusters or in sparsely
populated clusters. Thus, the usual tactic for uncovering anomalous records is to
develop an explorative clustering solution and then further investigate the results.
A specialized technique in the field of unsupervised record screening is IBM SPSS
Modeler's Anomaly Detection. It is an exploratory technique based on clustering.
It provides a quick, preliminary data investigation and suggests a list of records with
odd data patterns for further investigation. It evaluates each record's ''normality''
in a multivariate context and not on a per-field base by assessing all the inputs
together. More specifically, it identifies peculiar cases by deriving a cluster solution
and then measuring the distance of each record from its cluster central point,
the centroid. An anomaly index is then calculated that represents the proximity of
each record to the other records in its cluster. Records can be sorted according
to this measure and then flagged as anomalous according to a user-specified
threshold value. What is interesting about this algorithm is that it provides the
reasoning behind its results. For each anomalous case it displays the fields with the
unexpected values that do not conform to the general profile of the record.
Supervised and Unsupervised Models for Detecting Fraud
Unsupervised record screening techniques can be applied for fraud detection
even in the absence of recorded fraudulent events. If past fraudulent cases
are available, analysts can try a supervised classification model to identify
the input data patterns associated with the target suspicious activities. The
supervised approach has strengths since it works in a more targeted way than
unsupervised record screening. However, it also has specific disadvantages.
Since fraudsters' behaviors may change and evolve over time, supervised
models trained on past cases may soon become outdated and fail to capture
new tricks and new types of suspicious patterns. Additionally, the list of
past fraudulent cases, which is necessary for the training of the classification
model, is often biased and partial. It depends on the specific rules and criteria
Search WWH ::




Custom Search