Introduction to Video and Image Processing

If you look at the image in Fig. 1.1 you can see three children. The two oldest children look content with life, while the youngest child looks a bit puzzled. We can detail this description further using adjectives, but we will never ever be able to present a textual description, which encapsulates all the details in the image. This fact is normally referred to as “a picture is worth a thousand words”.

So, our eyes and our brain are capable of extracting detailed information far beyond what can be described in text, and it is this ability we want to replicate in the “seeing computer”. To this end a camera replaces the eyes and the (video and image) processing software replaces the human brain. The purpose of this topic is to present the basics within these two topics; cameras and video/image processing.

Cameras have been around for many years and were initially developed with the purpose of “freezing” a part of the world, for example to be used in newspapers. For a long time cameras were analog, meaning that the video and images were captured on film. As digital technology matured, the possibility of digital video and images arose, and video and image processing became relevant and necessary sciences.

An image containing three children

Fig. 1.1 An image containing three children

Some of the first applications of digital video and image processing were to improve the quality of the captured images, but as the power of computers grew, so did the number of applications where video and image processing could make a difference. Today, video and image processing are used in many diverse applications, such as astronomy (to enhance the quality), medicine (to measure and understand some parameters of the human body, e.g., blood flow in fractured veins), image compression (to reduce the memory requirement when storing an image), sports (to capture the motion of an athlete in order to understand and improve the performance), rehabilitation (to assess the locomotion abilities), motion pictures (to capture actors’ motion in order to produce special effects based on graphics), surveillance (detect and track individuals and vehicles), production industries (to assess the quality of products), robot control (to detect objects and their pose so a robot can pick them up), TV productions (mixing graphics and live video, e.g., weather forecast), biometrics (to measure some unique parameters of a person), photo editing (improving the quality or adding effects to photographs), etc.

Many of these applications rely on the same video and image processing methods, and it is these basic methods which are the focus of this topic.

The Different Flavors of Video and Image Processing

The different video and image processing methods are often grouped into the categories listed below. There is no unique definition of the different categories and to make matters worse they also overlap significantly. Here is one set of definitions: Video and Image Compression This is probably the most well defined category and contains the group of methods used for compressing video and image data. Image Manipulation This category covers methods used to edit an image. For example, when rotating or scaling an image, but also when improving the quality by for example changing the contrast.

Image Processing Image processing originates from the more general field of signal processing and covers methods used to segment the object of interest. Segmentation here refers to methods which in some way enhance the object while suppressing the rest of the image (for example the edges in an image).

Video Processing Video processing covers most of the image processing methods, but also includes methods where the temporal nature of video data is exploited. Image Analysis Here the goal is to analyze the image with the purpose of first finding objects of interest and then extracting some parameters of these objects. For example, finding an object’s position and size.

Machine Vision When applying video processing, image processing or image analysis in production industries it is normally referred to as machine vision or simply vision.

Computer Vision Humans have human vision and similarly a computer has computer vision. When talking about computer vision we normally mean advanced algorithms similar to those a human can perform, e.g., face recognition. Normally computer vision also covers all methods where more than one camera is applied.

The block diagram provides a general framework for many systems working with video and images

Fig. 1.2 The block diagram provides a general framework for many systems working with video and images

Even though this topic is titled: “Video and Image Processing” it also covers basic methods from Image Manipulation and Image Analysis in order to provide the reader with a solid foundation for understanding and working with images and video.

General Framework

No matter which category you are working within (except for Video and Image Compression) you can very often apply the framework illustrated in Fig. 1.2. Sometimes not all blocks are included in a particular system, but the framework nevertheless provides a relevant guideline.

Underneath each block in the figure we have illustrated a typical output. The particular outputs are from a gesture-based human-computer-interface system that counts the number of fingers a user is showing in front of the camera.

Below we briefly describe the purpose of the different blocks:

Image Acquisition In this block everything to do with the camera and setup of your system is covered, e.g., camera type, camera settings, optics, and light sources. Pre-processing This block does something to your image before the actual processing commences, e.g., convert the image from color to gray-scale or crop the most interesting part of the image (as seen in Fig. 1.2).

Segmentation This is where the information of interest is extracted from the image or video data. Often this block is the “heart” of a system. In the example in the figure the information is the fingers. The image below the segmentation block shows that the fingers (together with some noise) have been segmented (indicated by white objects).

Representation In this block the objects extracted in the segmentation block are represented in a concise manner, e.g., using a few representative numbers as illustrated in the figure.

Classification Finally this block examines the information produced by the previous block and classifies each object as being an object of interest or not. In the example in the figure this block determines that three finger objects are present and hence output this.

It should be noted that the different blocks might not be as clear-cut defined in reality as the figure suggests. One designer might place a particular method in one block while another designer will place the same method in the previous or following block. Nevertheless the framework is an excellent starting point for any video and image processing system.

The last two blocks are sometimes replaced by one block called BLOB Analysis. This is especially done when the output of the segmentation block is a black and white image as is the case in the figure. In this topic we follow this idea and have therefore merged the descriptions of these two blocks into one—BLOB Analysis. 

Table 1.1 The organization and topics of the different topic





Image Acquisition

This chapter describes what light is and how a camera can capture the light and convert it into an image.


Color Images

This chapter describes what color images are and how they can be represented.


Point Processing

This chapter presents some of the basic image manipulation methods for understanding and improving the quality of an image. Moreover the chapter presents one of the basic segmentation algorithms.


Neighborhood Processing

This chapter presents, together with the next chapter, the basic image processing methods, i.e., how to segment or enhance certain features in an image.



Similar to above, but focuses on one particular group of methods.


BLOB Analysis

This chapter concerns image analysis, i.e., how to detect, describe, and classify objects in an image.


Segmentation in Video

While most methods within image processing also apply to video, this chapter presents a particularly useful method for segmenting objects in video data.



This chapter is concerned with how to following objects from image to image.


Geometric Transformation

This chapter deals with another aspect of image manipulation, namely how to change the geometry within an image, e.g., rotation.


Visual Effects

This chapters shows how video and image processing can be used to create visual effects.

12 + 13

Application Examples

In these chapters concrete examples of video processing systems are presented. The purpose of these chapters is twofold. Firstly to put some of the presented methods into a context and secondly to provide inspiration for what video and image processing can be used for.

Next post:

Previous post: