Graphics Reference
In-Depth Information
Introduction
3.1
he first stages of any data analysis are to get to know the aims of the study and to
get to know the data. In this study the main goal is to predict a company's chances of
going bankrupt based onits recent financial returns. Inanother chapterofthe Hand-
book, some sophisticated prediction models based on support vector machines are
discussed for a similar dataset. Here, visualization methods are used to explore the
large dataset of American company accounts that was made available for predicting
bankruptcy in order to get to know the data and to assess the quality of the dataset.
his is an initial exploratory analysis that does not use any expert accounting knowl-
edge.
Exploratory data analysis (EDA)has been a well-known term in the field of statis-
tics since Tukey's historic book(Tukey, ).Whileeveryone acknowledges the im-
portance of EDA,little elsehas been written about it,and modernmethods -includ-
ing interactive graphics (Unwin, ) - are not as commonly applied in practice as
they might be.Interactive graphics were used extensively in the exploratory work for
thischapter.hedataset isbynomeans particularly bigbutitdoescontain morethan
cases. Ways of graphically displaying large datasets are discussed in detail in
Unwin et al. ( ).
When considering graphic displays, it isnecessary to distinguish between presen-
tation and exploratory graphics. Graphics for displaying single variables or pairs of
variables are oten used to present distributions of data or to present results. Care
must be taken with scales, with aspect ratios, with legends, and with every graphical
propertythatmayaffectthesuccessofthedisplayatconveying information toothers.
Graphics for exploration are very different. hey are more likely to be multivariate,
and there is no need to be particularly concerned about the aesthetic features of the
graphic; the important thing is that they give a clear representation of the data. Pre-
sentation graphics are drawn to be inspected by many people (possibly millions of
people, if they are used on television), and they can be long-lived. For example, Play-
fair's plots (Playfair, ) of English trade data are over years old, but are still
informative. Exploratory graphics are drawn for one or two people and may be very
short-lived. A data analyst may examine a large number of graphics before finding
one that reveals information, and, having found that information, the analyst may
decide that another kind of display is actually better at presenting it than the display
or displays that led to its discovery.
hegraphics showninthischapterareasubsetofthose usedinthestudy.heyare
not supposed to be “pretty;” they have been drawn to do a job. Detailed scales have
not been included, as it is always the distributional form that is of primary interest.
Exploratory analyses are subjective, and each analyst may choose different graph-
ics and different combinations of them to uncover information from data. he only
thing that can besaid with certainty is that analysts whodo not use graphics toget to
know their data will have a poor understanding of the properties of the data they are
working with. If nothing else, graphics are extremely useful for assessing the quality
of data and for cleaning data.
Search WWH ::




Custom Search