Automatic Wood Log Segmentation Using Graph Cuts (Computer Vision,Imaging and Computer Graphics) Part 1

Abstract. Segmenting foreground from background automatically is an active field of research. The graph cut approach is one of the promising methods to solve this problem. This approach requires that the weights of the graph are chosen optimally in order to obtain a good segmentation. We address this challenge focusing on the automatic segmentation of wood log images. We present a novel method based on density estimation to obtain information about both foreground and background. With this information the weights in the graph cut method can be set automatically. In order to validate our results, we use four different methods to set these weights. We show that of these approaches, our new method obtains the best results.

Keywords: Image segmentation, Graph cuts, Foreground extraction, Weight setting, Density estimation.

Introduction

Graph-cut based segmentation techniques are a very powerful tool in image segmentation. In interactive image analysis, e.g. in medical imaging [7], it is possible to get a good segmentation with only a few refinements. In a fully automatic system there is no possibility for refinements and so the problem of initialization of the graph-cut algorithm, mainly setting the weights for the graph, is a difficult problem. In this paper the problem is especially addressed to automatically and soundly segment wood log images.

The volume of wood and the sizes of the logs are important factors of commercial and logistic processes in timber industry. There are different methods to measure the amount of wood. Most reliable techniques are laser scanning methods or wood weighing for volume analysis. These methods, however, are mainly used in factories since they are relatively difficult to apply. In forests estimations are used often. After logs have been cut they are piled up onto stacks. The amount of cut wood is then estimated from the front side of the stack. Furthermore, the distribution of log diameters is estimated by visual judgment only. It is described by a constant representing the average diameter because measuring the diameter of each log is impossible in practice.


Our aim is to use computer vision methods to make these front side measurements faster and more reliable. Separating the log cut surfaces from the background leads to the well known problem of binary image segmentation. A robust automatic binary segmentation allows more accurate volume estimation techniques than the manual one described briefly above. The imaging devices we use are mobile phone cameras as these are lightweight common place devices. On the other hand, their optics incorporate some trade-offs on image quality. Other challenge are high variations resulting from lightning in forest and season changes. Furthermore, we restrict ourselves to images taken from frontal positions.

The contribution of this paper is a methodology for automatic and robust foreground / background segmentation of these low quality wood log images. We use a min-cut/max-flow graph-cut algorithm in conjunction with a kd-tree accelerated density estimation. Our novel density estimation, which we call KD-NN, improves current approaches with respect to robustness and is relatively simple to implement, which we call KD-NN in the following. The methodology is split into two parts. First, we make use of some properties of the image strucure to initiate models in color space. Second, we used these to yield the final segmentation. The graph-cut algorithm is used in both steps.

Related Work

Many research has been done in inspection of wood [1] [14], but the segmentation of wood logs has been considered in imaging barely. F. Fink describes in [9] a classical vision algorithms and active contours to find the log cut surfaces. The proposed procedure is only half automatic and makes a lot of assumptions about the type of wood, the lighting conditions and the image quality. The photos of the stack of wood are taken in a controlled environment which makes the proposed algorithms useless in our scenario.

In computer vision there is a variety of approaches to tackle the binary segmentation problem [10] [6]. However, when it comes to stable and fully automatic segmentation of natural images, things are getting complicated. Normalized Cuts [17] are used to find a reasonable partition of the image. But these kinds of algorithms are optimized in order to solve the grouping problem in vision.

A different approach is the quite popular binary segmentation framework from the fields of combinatorial optimization. Hereby, a graph-cut results from solving for the maximum flow [3]. The corresponding minimum cut then describes the resulting partition of the image. There are algorithms that are optimized for grid-graphs [4] and provide a feasible solution in a few seconds, even for megapixel sized images.

There are different approaches to set the weights of the graph. The weights between two adjacent vertices are set often by using some metric like the euclidean for instance. While finding a pixel to pixel distance is straight forward, existing algorithms differ in setting the weights for the remaining edges to the source and the sink node. A good and practical solution for gray scaled images is the usage of a histogram which describes the two necessary distributions for binary segmentation [3]. In the case of RGB images, it is more challenging to find a model for the two distributions. In [5] a Gaussian Mixture Model (GMM) is used for description. However, the method deals with interactive segmentation. It is derived from [12].

We implemented this GMM method and found it unstable. The method is quite prone to outliers. In interactive segmentation, this is not so important because the model can be changed quickly. But this does not meet our requirements. Instead, in our comparison we replaced the method for estimating the GMMs by the Expectation Maximization algorithm.

Logs with different shape and color

Fig. 1. Logs with different shape and color

Problem Discussion

The volume especially the solid cubic meter of a stack of wood is simple computable via a multiplication of the area of the log cut surfaces with the depth. The depth per stack of wood is known, but not the wood surfaces. To determine the wood area from images, a photo from the stack of wood must be taken from a frontal position, whereby the log cut surfaces must be visible (see figure 2).

To obtain the area of wood the image must be first transformed into a real coordinate system, so that every pixel has the same area in real square meter. A stack of wood often does not fit into a single image. Due to this problem more images must be taken from one stack of wood and be stitched together. Then a segmentation is required to separate the wood and non-wood pixel. How to transform the images and stitch them together is beyond the scope of this paper. Therefore, we here only address the problem of segmentation.

The objective is to separate automatically and soundly the log cut surfaces from different images. Log cut surfaces of a stack of wood in praxis vary in shape and color. The shape of an log cut surface seems to be close to a circle or an ellipse, but that is not ever given in praxis (see figure 1). Therefore a shape finding technique, e.g. ellipse fitting, cannot be applied. Furthermore the color from one stack of wood to another is different and dependent on the wood type, whereby no general color matching is appropriate.

Logs have a certain self-similarity, but in a different degree. Hence the use of simple region based methods, e.g. watershed or split and merge, lead to the well known problem of under- or over-segmentation. In summary, there is no exact color or shape for all logs usable, but for one single stack of wood the color is mostly similar and the gradients between logs and non logs are often high.

An sample image of a stack of wood

Fig. 2. An sample image of a stack of wood

For this reason we extract color information from the image first and use this to segment the image. To use both characteristics of local gradient and global color a graph-cut approach is used.

Our Approach

For our approach we require some general restrictions on the image acquisition to have some context information. First, the image needs to be taken from a frontal position as mentioned earlier and the stack of wood has to be in the center. Second, the upper and lower area of the image must not contain log cut surfaces. This is realized in common praxis because a stack of wood has a limited height (see also figure 2).

In the following, the image part that represents the log cut surfaces is called foreground and the remaining areas are called background. This follows the common naming conventions in graph-cut papers.

The segmentation is generally split into two main parts (see figure 3). In step one a region in the center of the input image is used to extract information about the log cut surfaces and the darker regions in between. In this step a number of properties and a first graph-cut with our novel KD-NN to set the graph weights are used.

Additionally, one subimage from the bottom and one from the top are used to gain information about the background. Both subimages are needed because they represent characteristics of quite different parts of the image, e.g. sky, forest, soil, snow or grass.

This information is used together with the background characteristics to apply graph-cut a second time to the whole input image, but this time with different weight setting algorithms for later comparison. Background characteristics means the objects in front of and behind the stack of wood. All steps are described in detail in the following sections.

The two parts of the segmentation procedure

Fig. 3. The two parts of the segmentation procedure

Our Novel Approach for Fore- and Background Extraction

The aim of the first part of the segmentation is to find a first estimate for the foreground (log cut surfaces) and background color models. These models are necessary for the graph-cut algorithm to accurately segment the log cut surfaces. The following description is also illustrated in figure 3.

Let I denote the input image of size m x n where m is the height and n the width. We extract three subimages from I. First, we extract a subimage Ic from the center region. The other two subimages, Ib\ and Ib2, are extracted from the top and bottom of I respectively. Due to the constraints in image acquisition, Ic contains log cut surfaces only and the shadowed regions in between whereas Ibl and Ib2 contain regions to be classified as background. We chose Ic to be of sizetmp5839224_thumbIbi and Ib2 are horizontal bars of sizetmp5839225_thumbOur experiments have proven these dimensions reasonable.

Whereas the pixels of Ibl and Ib2 can directly be used for the background model we need to segment Ic into foreground and background pixels. We do this by using a stable novel method which is described in detail in the following.

Segmenting Ic is much easier than the segmentation of I because the difference in luminance between the log cut surfaces and shadowed regions in between is very strong. Nevertheless, muddy logs, different types of wood and leaves are disturbing factors.

Hence, for a stable segmentation we use the intersection of two binary threshold segmentations where each of these is performed in another color space. The V-channel from the HSV color space is thresholded directly. In RGB color space we use the observation that wood surfaces often contain a strong yellow component. Therefore, we extract Y from the RGB image using the following equation pixel wise,

tmp5839226_thumbtmp5839227_thumbtmp5839228_thumb

Both channels V and Y are automatically thresholded by using the method in [13]. The resulting binary images are Vb and Yb. The intersection of both

tmp5839229_thumb

results in a trimaptmp5839230_thumbis the foreground,tmp5839231_thumbis the back ground and !Unknown are pixels of which it is not known, if they belong to the foreground or the background. Tunknown is expanded further by morphological operators to ensure definitely a clean segmentation.

To get a more accurate binary segmentation of Ic a first graph-cut segmentation is used. The pixel sets Tfg and Tbg are used to build the foreground and background model. The result is a binary segmentationtmp5839232_thumbFor the final segmentation a second graph cut is applied to I. The pixel set to describe the foreground is Bfg. The background model is built by using the pixel set uniontmp5839233_thumb

Graph-Cut and Weight Setting

Graph based image segmentation methods represent the problem in terms of a graph G = (V, E). The graph consists of a set of nodes v e V and a set of weighted edges e e E connecting them. In an image each node corresponds to a pixel which is connected to its 4 neighbors. Additionally there are two special nodes called terminals (sink and source), which represent foreground and background. Each node has a link to either of the terminals.

In the following let wr be the weights of the edges connecting pixels and terminals and let wb be the weights of inter pixel edges. We assume that all weights wb, wr for the graph are in the interval [0,1]. A vector in feature space (RGB color space) is denoted by x.

For setting the wb, we applied the following formula, whereby i and j indicate adjacent graph nodes,

tmp5839238_thumb

We used the Manhattan Metric because it is a little faster than the Euclidian Metric and the difference in segmentation results was negligible for our images. A higher choice for the value of the free parameters a leads to more similarity between two feature vectors.

For setting the wr to the terminal nodes we implemented four different feature space analysis methods, each of which is described in detail in the following subsections. In every case we built two models, one for the foreground and one for the background respectively. We did not introduce indices in the formulas to keep the notation uncluttered.

After having set all weights, the min-cut/max-flow algorithm from [4] is applied.

In the following we describe four different methods to determine wr by using the foreground pixels Bfg and the background pixels Bbg. Afterward the results will be present and compared.

Histogram Probability. A simple method to determine a probability is to use a histogram. We used a 3D histogram to approximate the distribution of the foreground and background over the color space. One bin of the histogram represent a probability ph. The probability can be used to determine wr. To reduce noise, we scale the histogram with the factor hs e [0,1], which is experimentally set and used for comparison. The number of bins per color channel in our case is calculated with [256 * hs\. A problem of histograms are peaks, which lead to many low probabilities. Peaks in our case arise from many pixel with the same color in Bbg or Bfg. To get a suitable weight the histogram is normalized first, so that the maximal bin has a probability of 1. Afterward, the weights are determined by the following equation, which get the best results in our experiments,

tmp5839239_thumb

The sumh indicates the sum of all bins, ph the bin value and so the probability of the corresponding pixel and β is a free parameter.

K-Mean Clustering. Clustering is a common used method to grouping similar points into different clusters. K-Mean is a simple clustering algorithm and is often used to cluster the color space. In our case we applied K-Mean to cluster the background and foreground pixels. The results are k cluster with the mean mi, whereby i e {1,...,k}. To determine the weights different ways are possible. One possibility is to find the nearest cluster mean mi, calculate the distance dmin between the pixel and mi and directly determine the weight. Another way is to determine the weights with the average of all distances to all cluster means mi. Additionally the amount of pixels per cluster can also be included for weight computation. We experimentally found out, that the best segmentation results are created by using the euclidean distance to one cluster mean and the free parameter γ by

tmp5839240_thumb

Gaussian Mixture Models. A Gaussian Mixture Models (GMM) is a compact way to describe a density function. It may be seen as a generalization of K-Means Clustering. The standard way to find the mean vectors and covariance matrices of a GMM of K components is the Expectation Maximization Algorithm [2]. To speed up learning we additionally sampled from our learning data.

When predicting from the model we can not take the density function values directly to initialize the weights of the graph. The reason therefor are very small probability values and sharply peaked gaussian components. Instead, we used it in a similar fashion like the K-Mean Clustering. We left the normalization factors out in the prediction phase, so our model reduces to

tmp5839241_thumb

The Kk sum up to 1. That means the weights wr will lie in [0,1]. The difference to the cluster centers is anisotropic, whereby the simple K-Means approach leads to isotropic differences. We call this method EM-GMM in the following.

Our Novel Density Estimation by Nearest Neighborhood. Our novel method is based on density estimation by using a kd-tree, which we refer to as KD-NN. To set the weights to source and sink node we use two kd-tree based models. The kd-trees contain all selected pixels and the associated values. One contains the foreground and one the background pixels. So all information is stored and used to set the weights in a later step by using NN. The used model is similar to a photon map in [11]. A photon map contains photons, our contains pixels. Both are used for density estimation. For fast NN search a balanced kd-tree is needed. We use a data driven kd-tree and save a pixel in relation to the color value. Each node of the tree splits the color space by one dimension, stores the position in color space and the amount of pixels there are. We built a balanced kd-tree [11]. Building a kd-tree in this way is a O(n * log n) operation.

Similar to [8] nearest neighborhoods are involved. The NN are used for density estimation and setting the region weights wr. Especially for every graph node v the corresponding density in color space within a sphere is used.

The first step is to determine the density in relation to all pixels in the kd-tree. Hence, the number of pixels pall in the kd-tree and the volume of the color space are used to calculate an overall density pall. In our case, we use the RGB color space, where R,G, B e [0, 255], and compute the density,

tmp5839242_thumb

We estimate the density for every v by using its sphere environment (see figure 4). The number of pixels within this sphere with a predefined radius r is searched in the kd-tree. The NN search in a balanced kd-tree is an O(log(n)) operation [16]. The volume of the search sphere vs is

tmp5839243_thumb

The density ps within the sphere environment is estimated by the number of found pixels ps. It is used for the weight calculation in the following section,

tmp5839244_thumb

The setting of wr is done by the density ps. However, the number of pixels in one kd-tree and hence ραιι is different and depends on the foreground and background pixels. Hence, we use the overall density pall to determine a factor s, which is taken for weight computation.

The idea is to map the overall density pall, which is also the mean density from the spheres in all possible positions, to a defined weight wm. So, if the mean density is found, the weight wb will be equal to wm.

The pixels of the log cut surfaces are visualized as points in RGB color space. The yellow sphere in the center demonstrates the search environment for the density estimation. Also, there is a number of outliers.

Fig. 4. The pixels of the log cut surfaces are visualized as points in RGB color space. The yellow sphere in the center demonstrates the search environment for the density estimation. Also, there is a number of outliers.

In addition, a greater density than pau must produce a high weight and a lower density a low weight, which all must be in the interval [0,1]. Therefore, the factor is determinated by the following equation,

tmp5839246_thumb

Finally, the region weights are estimated by the density in the search sphere and the predetermined factor s by

tmp5839247_thumb

Next post:

Previous post: