Database Reference
In-Depth Information
Normalizing features
Once the features have been extracted into the form of a vector, a common preprocessing
step is to normalize the numerical data. The idea behind this is to transform each numerical
feature in a way that scales it to a standard size. We can perform different kinds of normal-
ization, which are as follows:
Normalize a feature : This is usually a transformation applied to an individual fea-
ture across the dataset, for example, subtracting the mean ( centering the feature) or
applying the standard normal transformation (such that the feature has a mean of
zero and a standard deviation of 1).
Normalize a feature vector : This is usually a transformation applied to all fea-
tures in a given row of the dataset such that the resulting feature vector has a nor-
malized length. That is, we will ensure that each feature in the vector is scaled such
that the vector has a norm of 1 (typically, on an L1 or L2 norm).
We will use the second case as an example. We can use the norm function of numpy to
achieve the vector normalization by first computing the L2 norm of a random vector and
then dividing each element in the vector by this norm to create our normalized vector:
np.random.seed(42)
x = np.random.randn(10)
norm_x_2 = np.linalg.norm(x)
normalized_x = x / norm_x_2
print "x:\n%s" % x
print "2-Norm of x: %2.4f" % norm_x_2
print "Normalized x:\n%s" % normalized_x
print "2-Norm of normalized_x: %2.4f" %
np.linalg.norm(normalized_x)
This should give the following result (note that in the preceding code snippet, we set the
random seed equal to 42 so that the result will always be the same):
x: [ 0.49671415 -0.1382643 0.64768854 1.52302986 -0.23415337
-0.23413696 1.57921282 0.76743473 -0.46947439 0.54256004]
2-Norm of x: 2.5908
Normalized x: [ 0.19172213 -0.05336737 0.24999534 0.58786029
-0.09037871 -0.09037237 0.60954584 0.29621508 -0.1812081
Search WWH ::




Custom Search