Time Stamps and Financial Modeling - Doing Data Science

Databases Reference

In-Depth Information

then the mean or the variance of that transformed data). This ends up

being a submodel of our model.

Transforming Your Data

Outside of the context of financial data, preparing and transforming

data is also a big part of the process. You have a number of possible

techniques to choose from to transform your data to better “behave”:

• Normalize the data by subtracting the mean and dividing by the

standard deviation.

• Alternatively normalize or scale by dividing by the maximum

value.

• Take the log of the data.

• Bucket into five evenly spaced buckets; or five evenly distributed

buckets (or a number other than five), and create a categorical

variable from that.

• Choose a meaningful threshold and transform the data into a

new binary variable with value 1, if a data point is greater than

or equal to the threshold, and 0 if less than the threshold.

Once we have estimates of our mean y and variance σ 2 , we can nor‐

malize the next data point with these estimates just like we do to get

from a Gaussian or normal distribution to the standard normal dis‐

tribution with mean = 0 and standard deviation = 1:

y − y

σ y

y ↦

Of course we may have other things to keep track of as well to prepare

our data, and we might run other submodels of our model. For ex‐

ample, we may choose to consider only the “new” part of something,

which is equivalent to trying to predict something like y t − y t −1 instead

of y t . Or, we may train a submodel to figure out what part of y t −1

predicts y t , such as a submodel that is a univariate regression or

something.

There are lots of choices here, which will always depend on the situa‐

tion and the goal you happen to have. Keep in mind, though, that it's

Search WWH ::

Custom Search

Home