Database Reference
In-Depth Information
Extracting useful features from your data
Once we have completed the initial exploration, processing, and cleaning of our data, we
are ready to get down to the business of extracting actual features from the data, with
which our machine learning model can be trained.
Features refer to the variables that we use to train our model. Each row of data contains
various information that we would like to extract into a training example. Almost all ma-
chine learning models ultimately work on numerical representations in the form of a vec-
tor ; hence, we need to convert raw data into numbers.
Features broadly fall into a few categories, which are as follows:
Numerical features : These features are typically real or integer numbers, for ex-
ample, the user age that we used in an example earlier.
Categorical features : These features refer to variables that can take one of a set of
possible states at any given time. Examples from our dataset might include a user's
gender or occupation or movie categories.
Text features : These are features derived from the text content in the data, for ex-
ample, movie titles, descriptions, or reviews.
Other features : Most other types of features are ultimately represented numeric-
ally. For example, images, video, and audio can be represented as sets of numerical
data. Geographical locations can be represented as latitude and longitude or geo-
hash data.
Here we will cover numerical, categorical, and text features.
Search WWH ::




Custom Search