Visualizing Functional Data with an Application to eBay’s Online Auctions - Data Visualization

Graphics Reference

In-Depth Information

he actual choice of the smoothing parameters is oten driven by the context. In

our application, we pick the location and number of knots to reflect the bid-arrival

distribution, which is densest for the last day, and in particular for the last few mo-

ments of the auction. he choice of p depends, among other things, on whether

higher order derivatives of the curve are also desired. he value of the penalty term

λ is chosen by inspecting the resulting functional objects in order to ensure satisfac-

tory results (Ramsay and Silverman, ).An alternative approach is to pick λ soas

to balance the smoothness and the data fit (Wang et al., ). In particular, one can

measure the degree of smoothness of the spline via its distance tothe smoothest pos-

sible fit, a straight line through the data. he data fit, on the other hand, can be mea-

suredasthedistancebetween thesplineandtheactual data points.Onethenchooses

avalueofλ that balances the two. We investigate and compare different smoothing

parameters for our dataset in what follows.

heprocessofmovingfromobserveddatatofunctionaldataisthenasfollows.For

asetofn functional objects, let t ij denotethe time ofthe jth observation

(

n i

)

of the ith object

(

)

,andlety ij

(

t ij

)

denote the corresponding mea-

surements. Let f i

denote the penalized smoothing spline fitted to y i ,...,y in i .

hen, functional data analysis is performed on the continuous curves f i

(

)

rather

than on the noisy observations y i ,...,y in i . hat is, ater creating the functional ob-

jects f i

(

)

,theobserveddatay i ,...,y in i are discarded and subsequent modeling,

estimation and inference are based on the functional objects only.

One important implication of this practice is that any error or inaccuracy in the

smoothingstepwillpropagateintotheinferencesandconclusionsmadebasedonthe

functional model. To make matters worse, the observed data are discarded ater the

functional data are created and are therefore oten hard to retrieve, and any violation

of the functional model is confounded with the error at the smoothing step. hat

is, it is hard to know whether a model violation is due to model misspecification or

due to anomalies at the smoothing step. For this reason, it is important to carefully

monitor the functional object recovery process and to detect inaccuracies early in

the process using appropriate tools. Although measures for evaluating the goodness

of fit of the functional object to the observed data are available (such as those based

on the residual sums of squares, or criteria that include the roughness penalty), it is

unwise to rely on these measures alone, and visualization becomes an indispensable

tool in the process.

Consider Figs. . - . for illustration. he figures compare the functional objects

recovered forthree different smoothing scenarios. Specifically, for bidding data from

different eBay online auctions, Fig. . shows the functional objects obtained from

penalized smoothing splines using a spline order p

(

)

andasmallsmoothingpa-

rameter λ

. .Figure . on the other hand corresponds to the same spline order

) but a larger smoothing parameter (λ

).InFig. . weuseasplineorder

,andadata preprocessingstepvia interpolation.

heexactdetails ofthesmoothing arenotofinteresthereand canbefoundelsewhere

Jank and Shmueli ( ). What is of interest here though is the fact that Figs. . - .

correspond to three different approaches to recovering functional objects from the

same data. he researcher could have taken either one of these three approaches and

,asmoothing parameter λ

Data Visualization

Search WWH ::

Custom Search

Home