Time Stamps and Financial Modeling - Doing Data Science

Databases Reference

In-Depth Information

This generalizes to any model as well—you plot the cumulative sum

of the product of demeaned forecast and demeaned realized. (A de‐

meaned value is one where the mean's been subtracted.) In other

words, you see if your model consistently does better than the “stu‐

pidest” model of assuming everything is average.

If you plot this and you drift up and to the right, you're good. If it's too

jaggedy, that means your model is taking big bets and isn't stable.

Why Regression?

So now we know that in financial modeling, the signal is weak. If you

imagine there's some complicated underlying relationship between

your information and the thing you're trying to predict, get over

knowing what that is—there's too much noise to find it. Instead, think

of the function as possibly complicated, but continuous, and imagine

you've written it out as a Taylor Series. Then you can't possibly expect

to get your hands on anything but the linear terms.

Don't think about using logistic regression, either, because you'd need

to be ignoring size, which matters in finance—it matters if a stock went

up 2% instead of 0.01%. But logistic regression forces you to have an

on/off switch, which would be possible but would lose a lot of infor‐

mation. Considering the fact that we are always in a low-information

environment, this is a bad idea.

Note that although we're claiming you probably want to use linear

regression in a noisy environment, the actual terms themselves don't

have to be linear in the information you have. You can always take

products of various terms as x's in your regression. but you're still

fitting a linear model in nonlinear terms.

Adding Priors

One interpretation of priors is that they can be thought of as opinions

that are mathematically formulated and incorporated into our models.

In fact, we've already encountered a common prior in the form of

downweighting old data. The prior can be described as “new data is

more important than old data.”

Besides that one, we may also decide to consider something like “co‐

efficients vary smoothly.” This is relevant when we decide, say, to use

a bunch of old values of some time series to help predict the next one,

giving us a model like:

Search WWH ::

Custom Search

Home