Obtaining, Processing, and Preparing Data with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Numerical features

What is the difference between any old number and a numerical feature? Well, in reality,

any numerical data can be used as an input variable. However, in a machine learning mod-

el, we learn about a vector of weights for each feature. The weights play a role in mapping

feature values to an outcome or target variable (in the case of supervised learning models).

Thus, we want to use features that make sense, that is, where the model can learn the rela-

tionship between feature values and the target variable. For example, age might be a reas-

onable feature. Perhaps there is a direct relationship between increasing age and a certain

outcome. Similarly, height is a good example of a numerical feature that can be used direc-

tly.

We will often see that numerical features are less useful in their raw form, but can be

turned into representations that are more useful. Location is an example of such a case.

Using raw locations (say, latitude and longitude) might not be that useful unless our data is

very dense indeed, since our model might not be able to learn about a useful relationship

between the raw location and an outcome. However, a relationship might exist between

some aggregated or binned representation of the location (for example, a city or country)

and the outcome.

Search WWH ::

Custom Search

Home