# Point Processing (Introduction to Video and Image Processing) Part 3

## Histogram Equalization

Histogram equalization is based on non-linear gray-level mapping using a cumulative histogram.

Table 4.1 A small histogram and its cumulative histogram. i is the bin number, H[i] the height of bin i, and C[i] is the height of the ith bin in the cumulative histogram

 i 0 1 2 3 H [i] 1 5 0 7 C[i] 1 6 6 13

Fig. 4.14 An example of a cumulative histogram. Notice how the tall bins in the ordinary histogram translate into steep slopes in the cumulative histogram

Imagine we have a histogram H [i] where i is a bin number (between 0 and 255) and H [i ] is the height of bin i. The cumulative histogram is then defined as

In Table 4.1a small example is provided.

In Fig. 4.14 a histogram is shown together with its cumulative histogram. Where the histogram has high bins, the cumulative histogram has a steep slope and where the histogram has low bins, the cumulative histogram has a small slope. The idea is now to use the cumulative histogram as a gray-level mapping. So the pixel values located in areas of the histogram where the bins are high and dense will be mapping to a wider interval in the output since the slope is above 1. On the other hand, the regions in the histogram where the bins are small and far apart will be mapped to a smaller interval since the slope of the gray-level mapping is below 1.

For this to work in practice we need to ensure that the y-axis of the cumulative histogram is in the range [0, 255]. This is simply done by first dividing each value on the y-axis with count, i.e., the total number of pixels in the image, and then multiply with 255. In Fig. 4.15 the effect of histogram equalization is illustrated.

Fig. 4.15 The effect of histogram stretching and histogram equalization on an input image with both very high and very low pixel values

Fig. 4.16 An example of thresholding. Notice that it is impossible to define a perfect silhouette with the thresholding algorithm. This is in general the case

## Thresholding

One of the most fundamental point processing operations is thresholding. Thresholding is the special case when / = /2 in Eq. 4.11. Mathematically this is undefined, but in practice it simply means that all input values below /1 are mapped to zero in the output and all input values above /1 are mapped to 255 in the output. This means that we will only have completely black and completely white pixel values in the output image. Such an image is denoted a binary image, see Fig. 4.16, and this representation of an object is denoted the silhouette of the object.

Fig. 4.17 Ideal histogram: a clear definition of object and background. Problematic histogram: the distinction between the object and the background is harder, if not impossible

One might argue that we loose information when doing this operation. However, imagine you are designing a system where the goal is to find the position of a person in a sequence of images and use that to control some parameter in a game. In such a situation all you are interested in is the position of the person and nothing more. In this case, thresholding in such a manner that the person is white and the rest is black, would be exactly what we are interested in. In fact, we can say we have removed the redundant information or eliminated noise in the image.

Thresholding is normally not described in terms of gray-level mapping, but rather as the following segmentation algorithm:

where T is the threshold value. We might of course also reverse the equalities so that every pixel below the threshold value is mapped to white and every pixel above the threshold value is mapped to black.

In many image processing systems, thresholding is a key step to segmenting the foreground (information) from the background (noise). To obtain a good thresholding the image is preferred to have a histogram which is bi-modal. This means that the histogram should consist of two “mountains” where one mountain corresponds to the background pixels and the other mountain to the foreground pixels. Such a histogram is illustrated to the left in Fig. 4.17. In an ideal situation like the one shown to the right, deciding the threshold value is not critical, but in real life the two mountains are not always separated so nicely and care must therefore be taken when defining the correct threshold value.

In situations where you have influence on the image acquisition process, keep this histogram in mind. In fact, one of the sole purposes of image acquisition is often to achieve such a histogram. So it is often beneficial to develop your image processing algorithms and your setup (camera, optics, lighting, environment) in parallel.

Fig. 4.18 The box is defined by the threshold values. The box indicates the region within the RGB color cube where object pixels lie

## Color Thresholding

Color thresholding can be a powerful approach to segmenting objects in a scene. Imagine you want to detect the hands of a human for controlling some interface. This can be done in a number of ways, where the easiest might be to ask the user to wear colored gloves. If this is combined with the restriction that the particular color of the gloves is neither present in the background nor on the rest of the user, then by finding all pixels with the color of the gloves we have found the hands. This operates similarly to the thresholding operation described in Eq. 4.13. The difference is that each of the color values of a pixel is compared to two threshold values, i.e., in total six threshold values. If each color value for a pixel is within the threshold values, then the pixel is set to white (foreground pixel) otherwise black (background pixel). The algorithm looks as follows for each pixel:

where (R,G, B) are the RGB values of the pixel being processed and Rmin and Rmax define the range of acceptable values of red in order to accept the current pixel as belonging to an object of interest (similarly for green and blue).

The algorithm actually corresponds to defining a box in the RGB color space and classifying a pixel as belonging to an object if it is within the box and otherwise classifying it as background. This is illustrated in Fig. 4.18.

One problem with color thresholding is its sensitivity to changes in the illumination. Say you have defined your threshold values so that the system can detect the two gloved hands. If someone increases the amount of light in the room, the color will stay the same, but the intensity will change. To handle such a situation, you need to increase/decrease the threshold values accordingly. This will result in the box in Fig. 4.18 being larger and hence the risk of including non-glove pixels will increase. In the worst case, the box will be as large as the entire RGB color cube.

Fig. 4.19 The two gray shapes in figure (a) and (b) are defined by threshold values and indicate the regions within the two color spaces where object pixels lie. (a) The rg-color space. (b) The hs-color space. (c) An example of a shape that is not well defined by threshold values. Instead a LUT should be applied

The solution is to convert the RGB color image into a representation where the color and intensity are separated, and then do color thresholding on only the colors, e.g., rg-values or hs-values. The thresholds can now be more tight, hence reducing the risk of false classification. In Fig. 4.19 the equivalent of Fig. 4.18 is shown for rg- and hs-representations, respectively. Regardless of which color representation is applied, the problem of choosing proper threshold values is the same.

Sometimes we can find ourselves in a situation where the colors of an object are not easily described by a few threshold values. In Fig. 4.19(c) this is illustrated by the banana-shaped region. If you fit a box to this shape (by using four thresholds values) you will clearly include non-object pixels and hence have an incorrect segmentation of the object. The solution is to define a look-up-table (LUT). A LUT is a table containing the color values belonging to the object of interest (in some color space). These values can be found in a training phase by manually inspecting the object of interest in a number of different images. Normally the values are considered as an image and a morphologic closing operation, see Chap. 6, is performed to obtain a smooth and coherent shape. During run-time Eq. 4.14 is replaced by a function that takes the value of a pixel and test if this value is present in the LUT. If not, the corresponding output pixel is set to black, otherwise it is set to white.

No matter which color space you use for thresholding it is often a good idea to also do some thresholding on the intensity values. If you look at the color cube you can see that all possible colors will have a vector starting in (0,0,0). This means that the vectors will lie in the vicinity of (0, 0,0) and the practical meaning of this is that it is hard to distinguish colors when the intensity is low. Therefore it is often a good idea not to process the colors of pixels with low intensity values. Likewise, color pixels with a very high intensity might also be problematic to process. Say we have a pixel with the following RGB values (255, 250, 250). This will be interpreted as very close to white and hence containing no color. But it might be that the real values are (10000, 250, 250). You have no way of knowing, since the red value is saturated in the image acquisition process. So the red pixel is incorrectly classified as (close to) white. In general you should try to avoid saturated pixels in the image acquisition process, but when you do encounter them, please take great care before using the color of such a pixel. In fact, you are virtually always better off ignoring such pixels.

## Thresholding in Video

When you need to threshold a single image you can simply try all possible threshold values and see which one provides the best result. When you built a system that operates on live input video the situation is different. Imagine you have constructed a setup with a camera and some lighting etc. You connect a monitor and look at the images being captured by the camera. If nothing is happening in the images (static scene) the images will seem to be exactly the same. But they are not. For example, if the camera is mounted on a table which moves slightly whenever someone is walking nearby, the images will change slightly. Another typical situation is the fact that most indoor lighting is powered by an alternating light source, for example 50 Hz, meaning that the level of illumination changes rapidly over time. Such changes can often not be detected by simply looking at the scene. But if you subtract two consecutive images2 and display the result, you can experience this phenomena. If the images are in fact exactly the same, then the output image (after image subtraction) should only contain zeros, hence be black. The more non-zero pixels you have in the output image the more “noise” is present in your setup. Another way of illustrating such small changes is to calculate and visualize the histogram for each image. No matter what, it is always a good idea to use one of these methods to judge the uncertainties in your image acquisition/setup.

Due to these uncertainties you always need to learn the threshold values when processing video. In this context, learning means to evaluate what the right threshold value is in different situations and then select a representative value.Approaching the threshold value selection like this will help in many situation. But if you have a scenario where the lighting can change significantly, then you need a different approach.

A significant change is especially observed when sunlight enters the scene, either because the system operates outside or due to windows in the room where the setup is located. When a cloud passes in front of the sun an abrupt change can be seen in the images. Even without clouds, the changing position (and intensity) of the sun during the day can also result in large changes accumulating over time. Further abrupt changes appear due to the auto gain being enabled, see Chap. 2. Imagine a white object is entering a scene where the background is dark.

Fig. 4.20 Three images of the same scene with different illuminations and hence different histograms

As more and more of the object becomes visible in the scene the auto gain function will decrease the brightness accordingly in order to keep the overall brightness constant. This means that the threshold value needs to be changed from image to image and often rather significantly. Such significant changes can sometimes be handled by preforming a histogram stretching/equalization. This only works when the changes result in a shifted histogram (making the image brighter or darker) without changing the structure of the histogram. An example of a changed structure is when light from multiple windows illuminate the objects in the scene differently over time. In Fig. 4.20 examples of different illuminations of the same scene are shown.