Graphical Data Representation in Bankruptcy Analysis - Data Visualization

Graphics Reference

In-Depth Information

kernel functions allow for su ciently rich feature spaces, the performances of SVMs

are comparable in terms of out-of-sample forecasting accuracy (Vapnik, ).

Company Score Evaluation

4.3

he company score is computed as:

x w

f

(

x

)=

+

b ,

( . )

n

i = α i y i x i and b

where w

w; x + and x − are the observations from

the opposite classes for which constraint ( . ) becomes equality. By substituting the

scalar product with a kernel function, we will derive a nonlinear score function:

=

(

x +

+

x −

)

n

i =

f

x

K

x i , x

α i y i

b .

( . )

(

)=

(

)

+

he nonparametric score function ( . ) does not have a compact closed form rep-

resentation. his means that graphical tools are required to visualise it.

Variable Selection

4.4

In this section we describe the procedure and the graphical tools for selecting the

variables of the SVM model used in forecasts. We have two very important model

accuracy criteria: the accuracy ratio (AR), which will be used here as a criterion for

model selection (Fig. . ), and the percentage of correctly classified out-of-sample

observations. Higher values indicate better model accuracy.

Model selection proceeds from the simplest (i.e. univariate) models to the one

with the highest AR. he problem that arises is: how do we determine the variable

that provides the highest AR across possible data samples? For a parametric model,

we would need to estimate the distribution of the coe cients at the variables and

therefore their confidence intervals. his approach, however, is practically irrelevant

for nonparametric models.

Instead we can compare models using an accuracy measure, in our case AR. We

first estimate the AR distributions for different models. his can be done using boot-

strapping (Horowitz, ). We randomly select training and validation sets, each of

which is a subsample of solvent and insolvent companies. Weuse a / ra-

tio since this is the worst case with the minimum AR. he two sets do not overlap -

they do not contain common observations. For each of these sets we apply the SVM

with parameters that provide the highest AR for bivariate models (Fig. . ) and esti-

mate the ARs. hen we perform a Monte Carlo experiment: we repeat this process of

generating subsamples and computing ARs times. Each time we will record the

ARs and then estimate their distribution.

Data Visualization

Search WWH ::

Custom Search

Home