In Depth Tutorials and Information

Geometric Transformations (Introduction to Video and Image Processing) Part 1

Most people have tried to do a geometric transformation of an image when preparing a presentation or when manipulating an image. The two most well-known are perhaps rotation and scaling, but others exist. In this topic we will describe how such transformations operate and discuss the issues that need to be considered when doing such transformations.

The term “geometric” transformation refers to the class of image transformation where the geometry of the image is changed but the actual pixel values remain unchanged.1

Let us recall from the previous topics that an image is defined as f(x, y), where f(·) denotes the intensity or gray-level value and (x, y) defines the position of the pixel. After a geometric transformation the image is transformed into a new image denoted g(x’,y’), where the tic (’) means position in g(x, y). This might seem confusing, but we need some way of stating the position before the transformation (x, y) and after the transformation (x’,y’).

As mentioned above the actual intensity values are not changed by the geometric transformation, but the positions of the pixels are (from (x, y) to (x’,y’)). So if f (2, 3) = 120 then in general g(2, 3) = 120. A geometric transformation basically calculates where the pixel at position (x, y) in f(x, y) will be located in g(x’ ,y’). That is, a mapping from (x, y) to (x’,y’). We denote this mapping as

where Ax (x, y) and Ay (x, y) are both functions, which map from the position (x, y) to x’ and y’, respectively.

Affine Transformations

The class of affine transformations covers four different transformations, which are illustrated in Fig. 10.1. These are: translation, rotation, scaling and shearing.

Translation

Let us now look at the transformations in Fig. 10.1 and define their concrete mapping equations. Translation is simply a matter of shifting the image horizontally and vertically with a given off-set (measured in pixels) denoted Ax and Ay. For translation the mapping is thus defined as

So if Ax = 100 and Ay = 100 then each pixel is shifted 100 pixels in both the x- and y-direction.

Scaling

When scaling an image, it is made smaller or bigger in the x- and/or y-direction. Say we have an image of size 300 x 200 and we wish to transform it into a 600 x 100 image. The x-direction is then scaled by: 600/300 = 2. We denote this the x-scale factor and write it as Sx = 2. Similarly Sy = 100/200 = 1/2. Together this means that the pixel in the image f(x, y) at position (x, y) = (100,100) is mapped to anew position in the image g(x’,y’), namely (x ‘ ,y’) = (100 · 2,100 · 1/2) = (200, 50). In general, scaling is expressed as

Rotation

When rotating an image, as illustrated in Fig. 10.1(d), we need to define the amount of rotation in terms of an angle. We denote this angle θ meaning that each pixel in f(x, y) is rotated θ degrees. The transformation is defined as

Fig. 10.1 Different transformations

Note that the rotation is done counterclockwise since the y-axis is pointing downwards. If we wish to do a clockwise rotation we can either use —θ or change the transformation to

Shearing

To shear an image means to shift pixels either horizontally, Bx, or vertically, By. The difference from translation is that the shifting is not done by the same amount, but depends on wherein the image a pixel is. In Fig. 10.1(e) Bx = -0.5 and By = 0. The transformation is defined as

Combining the Transformations

The four transformations can be combined in all kinds of different ways by multiplying the matrices in different orders, yielding a number of different transformations. One is shown in Fig. 10.1(f). Instead of defining the scale factors, the shearing factors and the rotation angle, it is common to merge these three transformation into one matrix. The combination of the four transformations is therefore defined as

and this is the affine transformation. Below the relationships between Eq. 10.8 and the four above mentioned transformations are listed.

	a1	a2	a3	b1	^b2	b3
Translation	1	0	kx	0	1	ky
Scaling	Sx	0	0	0	Sy	0
Rotation			0			0
Shearing	1	Bx	0	By	1	0

Often homogeneous coordinates are used when implementing the transformation since they make further calculations faster. In homogeneous coordinates, the affine transformation becomes

where a3 = kx and b3 = ky.

Fig. 10.2 (a) Forward mapping. (b) Backward mapping

Making It Work in Practice

In terms of programming, the affine transformation consists of two steps. First the coefficients of the affine transformation matrix are defined. Second we go though all pixels in the image f(x,y) one at a time (using two for-loops as seen in Sect. 4.7) and for each pixel we find its new position in g(x’, y’) using Eq. 10.9. This process is known as forward mapping, i.e., mapping each pixel from f(x, y) to g(xr,yr), see Fig. 10.2(a).

At first glance this simple process seems to be fine, but unfortunately it is not! Let us have a closer look at the scaling transformation in order to understand the nature of the problem. Say we have an image of size 300 x 200 and want to scale this to 510 x 200. From above we can calculate the scaling factors as Sx = 510/300 = 1.7 and Sy = 200/200 = 1. Using Eq. 10.4 the pixel positions in a row of f(x, y) are mapped in the following manner:

x	0	1	2	3	4	5	6	7	8 ·	·· 300
x ‘	0	1.7	3.4	5.1	6.8	8.5	10.2	11.9	13.6 ·	·· 510

We can observe that "holes” are present in g(xr,yr). If for example 10.2 is rounded off to 10 and 11.9 to 12, then x’ = 11 will have no value, hence a hole in the image output. In Fig. 10.3 we have used forward mapping to scale image 10.1(a). The holes can be seen as the black pattern.

If the scaling factor is smaller than 1 then a related problem would occur, namely that multiple pixels from f(x,y) are mapped to the same pixel in g(xr,yr). This is not critical in terms of how the output would look like, but mapping multiple pixels to the same pixel in g(xr,yr) is computationally inefficient. Both these issues are present in all geometric transformations.

Fig. 10.3 Image scaling using forward mapping. Notice the black pattern, which is a result of the inherent problem related to forward mapping

Next post: Geometric Transformations (Introduction to Video and Image Processing) Part 2

Previous post: Tracking (Introduction to Video and Image Processing) Part 2