Database Reference
In-Depth Information
Examining the Model Errors to Reveal Anomalous or Even Suspect Cases
The examination of deviations of the predicted from the actual values can
also be used to identify outlier or abnormal cases. These cases may simply
indicate poor model performance or an unusual but acceptable behavior.
Nevertheless, they deserve special inspection since they may also be signs of
suspect behavior.
For instance, an insurance company can build an estimationmodel based
on the amounts of claims by using the claim application data as predictors.
The resulting model can then be used as a tool to detect fraud. Entries that
substantially deviate from the expected values could be identified and further
examined or even sent to auditors for manual inspection.
UNSUPERVISED MODELING TECHNIQUES
In the previous sections we briefly presented the supervised modeling techniques.
Whether used for classification, estimation, or field screening, their common
characteristic is that they all involve a target attribute which must be associated
with an examined set of inputs. The model training and data pattern recognition
are guided or supervised by a target field. This is not the case in unsupervised
modeling, in which only input fields are involved. All inputs are treated equally
in order to extract information that can be used, mainly, for the identification of
groupings and associations.
Clustering techniques identify meaningful natural groupings of records and
group customers into distinct segments with internal cohesion. Data reduction
techniques like factor analysis or principal components analysis (PCA) ''group''
fields into new compound measures and reduce the data's dimensionality without
losing much of the original information. But grouping is not the only application
of unsupervised modeling. Association or affinity modeling is used to discover
co-occurring events, such as purchases of related products. It has been developed
as a tool for analyzing shopping cart patterns and that is why it is also referred to
as market basket analysis. By adding the time factor to association modeling we
have sequence modeling: in sequence modeling we analyze associations over time
and try to discover the series of events, the order in which events happen. And
that is not all. Sometimes we are just interested in identifying records that ''do
not fit well,'' that is, records with unusual and unexpected data patterns. In such
cases, record screening techniques can be employed as a data auditing step before
building a subsequent model to detect abnormal (anomalous) records.
Search WWH ::




Custom Search