Graphics Reference
In-Depth Information
Table . . Cases with more liabilities than assets, and their bankruptcy status
Bankrupt
OK
TL.TA
. %
. %
TL.TA
. %
. %
conclusionsaboutthefactthatnoneofthebiggestcompanieswentbankruptandthat
noneofthecompaniesthatwentbankrupthadahighcashratio(eventhoughtheme-
dian is higher for these companies), but the selected cases make up such a small per-
centage of the total that caution should be exercised before drawing any conclusions.
Summary
3.9
Every data analysis is unique because the data are always different. In the study re-
ported here, there were mainly continuous variables (so parallel coordinate plots
wereuseful);thefewcategorical variables usually hadlargenumbersofcategories (so
these had to be combined into groups); there was a fairly large number of cases (so
approximating density estimations were helpful); and there were some variables that
were highly skewed (so that outliers and transformations were issues). A variety of
plots were used, including barcharts, histograms, spinograms, boxplots, scatterplots,
mosaicplots and parallel coordinate plots. Weighted versions of some plots also con-
tributed. Trellis displays might have been tried, but then shingling of the conditional
variables would have been required. hat would be more appropriate ater model-
ing. Interactivity, primarily selection, querying and linking, was used extensively, as
is clear from the plots, but zooming and reformatting were also used a lot in the ex-
ploratory analyses. It is not easy to illustrate EDA in print, and the chapter can only
convey a pale shadow of the actual process.
Data exploration is an important part of any data analysis. It is necessary to learn
about the data, to checkdata quality and tocarry out the data cleaning that is needed
(and data cleaning is always needed with real datasets). EDA revealed here that there
were some extreme outliers and some suspicious negative values. It underlined the
need to transform some of the variables and it highlighted the geographic and sec-
toral structure of the dataset. It also revealed the surprising age of some of the data
and the unexpected stability of the size distribution over time. Several interesting
associations between variables were uncovered. An investigation of the factorsinflu-
encing bankruptcy provided further insight into the data and prepared the ground
for statistical modeling.
Applying statistical modelsbefore exploring data is an ine cient approach. Prob-
lems may arise because of peculiarities in the data. Features are revealed that could
have been found much more easily just by looking. he only possible advantage of
Search WWH ::




Custom Search