Biomedical image analysis is a highly interdisciplinary field, being at the interface of computer sciences, physics, medicine, biology, and engineering. Fundamentally, biomedical image analysis is the application of image processing techniques to biological or medical problems. However, in biomedical image analysis, a number of other fields play an important role:
• Anatomy. Knowledge of shape, structure, and proximity to other anatomical objects can help identify features in images and determine abnormalities.
• Physiology. Physiology plays a role in functional imaging, where functional imaging should be defined very broadly, ranging from blood flow imaged with Doppler ultrasound to cell physiology imaged with microscopy and fluorescent probes. The term functional imaging is often used for modern methods to image physiological processes by means of functional MRI, PET, or SPECT.
• Physics of the imaging modality. Depending on the imaging modality, the image values represent fundamentally different properties of the imaged object. Examples include x-ray attenuation in CT, optical light scattering and absorption in microscopy and optical coherence tomography (OCT), and magnetic spin relaxation time constants in MRI.
• Instrumentation. Even within the same modality, images of the same object can be markedly different. One example is x-ray imaging and CT, where the anode voltage and the introduction of beam-hardening filters influence the apparent x-ray density, whereas beam collimation determines the amount of haze and blur in the image. Another example is the exposure time and source brightness in optical modalities (microscopy, OCT), which may produce contrast in different regions of the object. In addition, every imaging instrument introduces some amount of noise.
• Medical application. The medical application in diagnosis or intervention provides the foundation and motivation for biomedical image analysis. The selection of an imaging modality and of possible image processing steps depends on many medical factors, such as the suspected disease or the type of tissue to be imaged.
In addition, several fields of computer science exist on top of image processing that play a role in biomedical image analysis, most notably artificial intelligence and computer modeling. Artificial intelligence approaches find their ways into biomed-ical image analysis in the form of fuzzy logic,2,26,48 evolutionary computing,9,73 computer learning,44,59 and artificial neural networks.28,79,82 Computer models play a key role in advanced segmentation techniques and in the description of time-course dynamics.35,50,70
Biomedical image analysis consists of four distinct stages, where each stage is generally a prerequisite for the next stage, but at any stage the chain can end to allow a human observer to make a decision or record results. These stages are image acquisition, image enhancement and restoration, image segmentation, and image quantification.
The first stage is to gather information about the object, such as a suspect tissue in a patient. In the context of an image, the information is spatially resolved. This means that the image is a map of one or more tissue properties on the nodes of a discrete rectangular grid. The grid nodes coincide with integer coordinates, and the image value on an integer coordinate is termed a pixel (picture element) or voxel (volume element) in three-dimensional images. The image values are stored in finite memory; therefore, the image values themselves are also discrete. In many cases, image values are limited to integer values. Noninteger values can be used as image values if a floating-point representation for a pixel is used, but floating-point values have limited precision as well.
The image values themselves generally have physical meaning. To name a few examples, photography and microscopy provide image values that are proportional to light intensity. Computed tomography provides image values that are proportional to local x-ray absorption. In magnetic resonance imaging, the image values can represent a variety of tissue properties, depending on the acquisition sequence, such as local echo decay times or proton density.
The goal of the image acquisition stage is to obtain contrast. To use x-ray imaging as an example, let us assume a patient with a suspected clavicular hairline fracture. X-rays passing through the clavicle are strongly attenuated, and film optical density is low. X-rays along neighboring paths that lead though soft tissue are less attenuated, and the optical density at corresponding locations of the film is higher. Contrast between bone and surrounding soft tissue is generally high. Some x-rays will pass though the fracture, where there is less bone to pass through, and the corresponding areas of the x-ray film show a slightly higher optical density—they appear darker. In this example it is crucial that the contrast created by x-rays passing through the intact clavicle and through the fracture is high enough to become discernible. This task is made more difficult by the inhomogeneity of other tissue regions that are traversed by the x-rays and cause unwanted contrast. Unwanted contrast can be classified as noise in the broader sense (in a stricter sense, noise is a random deviation of a pixel value from an idealized value), and distinguishing between contrast related to the suspected disease—in this case the clavicular fracture—and contrast related to other contrast sources leads immediately to the notion of the signal-to-noise ratio. Signal refers to information (i.e., contrast) related to the feature of interest, whereas noise refers to information not related to the feature of interest.
The human eye is extremely good at identifying meaningful contrast, even in situations with poor signal-to-noise ratios. Human vision allows instant recognition of spatial relationships and makes it possible to notice subtle variations in density and to filter the feature from the noise. A trained radiologist will have no difficulty in identifying the subtle shadow caused by a hairline fracture, a microscopic region of lower optical density in a mammogram, or a slight deviation from the normal shape of a ventricle, to name a few examples. However, these tasks may pose a considerable challenge for a computer. This is where the next steps of the image processing chain come into play.
Image enhancement can serve two purposes: to improve the visibility of features to allow a human observer (radiologist) to make a more accurate diagnosis or better extract information, or to prepare the image for the next processing steps. The most common image enhancement operators are:
• Pixel value remapping. This includes linear or nonlinear contrast enhancement, histogram stretching, and histogram equalization.
• Filtering. Filters amplify or attenuate specific characteristics of an image, and filters make use of the pixel neighborhood. Filters that operate on a limited pixel neighborhood (often based on the discrete convolution operation) are referred to as spatial-domain filters. Other filters use a specific transform, such as the Fourier transform, which describes the image information in terms of periodic components (frequency-domain filters). To name a few examples, filters can be used to sharpen edges or smooth an image, to suppress periodic artifacts, or to remove an inhomogeneous background intensity distribution.
A specific form of image enhancement is image restoration, a specific filtering technique where a degradation process is assumed to be known. Under this assumption, the image acquisition process is modeled as the acquisition of an idealized, unde-graded image that cannot be accessed, followed by the degradation process. A restoration filter is a filter designed to reverse the degradation process in such a manner that some error metric (such as the mean-squared error) is minimized between the idealized unknown image and the restored image. The restored image is the degraded image subjected to the restoration filter. Since the idealized image is not accessible, the design of restoration filters often involves computer simulations or computer modeling.
Whether for enhancement or restoration, filter design is a critical step in image processing. Typically, the degradation process introduces two components: blur and noise. In some cases, such as microscopy, inhomogeneous illumination may also play a role. Illumination may change over time, or motion artifacts may be introduced. Unfortunately, filters often require balancing of design criteria. Filters to counteract blurring and filters for local contrast enhancement tend to amplify the noise component and therefore reduce the signal-to-noise ratio. Conversely, noise-reducing filters negatively affect edges and detail texture: Whereas these filters increase the signal-to-noise ratio, image details may get blurred and lost. Moreover, filter design goals depend on the next steps of the image processing chain. To enhance an image for a human observer, the noise component plays a less critical role, because the human eye can recognize details despite noise. On the other hand, automated processing such as segmentation requires that the image contains as little noise as possible even at the expense of some edge and texture detail.
Image segmentation is the step where an object of interest in the image is separated from the background. Background in this definition may include other objects. To perform image segmentation successfully, the object of interest must be distinguishable from background in some way: for example, by a difference in image intensity, by a delineating boundary, or by a difference in texture. Sometimes, a priori knowledge such as a known shape can help in the segmentation process. The goal of this process can be either a mask or an outline. A mask is an image where one pixel value (usually, 1) corresponds to an object pixel, while another pixel value (usually, 0) corresponds to background. An outline can be a parametric curve or a set of curves, such as a polygonal approximation of an object’s shape. There seems to be almost no limit to the complexity of segmentation approaches, and there is certainly a trend toward more complex segmentation methods being more application-specific. An overview of the most popular segmentation methods follows.
Intensity-Based Segmentation When the intensity of the object of interest differs sufficiently from the intensity of the background, an intensity threshold value can be found to separate the object from its background. Most often, an object is a more or less convex shape, and pixel connectivity rules can be used to improve the segmentation results. Intensity-based segmentation methods that make use of connectivity are region growing and hysteresis thresholding. Intensity thresholds can be either global (for the entire image) or locally adaptive. To some extent, intensity-based thresholding methods can be applied to images where the object texture (i.e., the local pixel intensity distribution) differs from background, because the image can be filtered to convert texture features into image intensity values. The latter is a good example of how image filtering can be used in preparation for image segmentation.
Edge-Based Segmentation Sometimes, objects are delineated by an intensity gradient rather than by a consistent intensity difference throughout the object. In such a case, a local contrast filter (edge detector) can be used to create an image where the outline of the object has a higher intensity than the background. Intensity-based thresholding then isolates the edge. Images containing the edge of the object are prerequisites for parametric segmentation methods such as boundary tracking and active contours. Parametric shapes (i.e., lines, circles, ellipses, or polygonal approximations of a shape) can also be extracted from the edge image using the Hough transform.
Region-Based Segmentation Some segmentation approaches make more extensive use of local similarity metrics. Regions may be similar with respect to intensity or texture. In fact, a feature vector can be extracted for each pixel that contains diverse elements, including intensity, local intensity variations, or directional variations. Similarity could be defined as the Euclidean distance between feature vectors. Unsupervised region-based methods are region splitting and region merging or region growing. Region splitting starts with the entire image as one region and subdivides the image recursively into squares with dissimilar regions, whereas region growing starts at the pixel level and joins similar regions. The two methods can be combined to form split-merge segmentation, where splitting and merging alternate.
Clustering If feature pixels are already separated from the background, clustering is a method to group the feature pixels into clusters. Assignment of individual pixels to clusters can be based on the Euclidean distance to the cluster center or on other similarity metrics. Most widely used are the k-means clustering method, where each pixel is assigned to exactly one cluster and fuzzy c-means clustering, where cluster membership is continuous (fuzzy) and to some degree, each pixel belongs to multiple clusters.
Neural Networks In medical imaging, a widely used approach is to train artificial neural networks with feature vectors that have been manually assigned to classes. Once a neural network has been trained, it can then segment similar images in an unsupervised manner. Although unsupervised segmentation is desirable, some manual interaction can greatly facilitate the segmentation task. Examples of limited manual interaction are the placement of seed points for region growing, and crude object delineation for active contour models and live-wire techniques.
If the segmentation result is a binary mask, some post processing may improve the segmentation result. Examples of post processing steps are removal of isolated pixels or small clusters, morphological thinning and extraction of a single pixel-wide skeleton, morphological operations to reduce boundary irregularities, filling of interior holes or gaps, and the separation of clusters with weak connectivity.
Similar to the way that a radiologist uses an image to assess the degree of a disease for a diagnosis, image quantification encompasses methods to classify objects or to measure the properties of an object. Image quantification requires that the object be segmented. The goal of image quantification is either to classify an object (e.g., as diseased or healthy) or to extract a continuous descriptor (e.g., tumor size or progression). The advantage of computerized image quantification is its objectivity and speed.
Examples of continuous variables include the measurement of intensity, density, size, or position. As an example, bone mineral density is a crucial determinant of bone strength, and people (especially women) lose bone mineral with age, a condition that may lead to osteoporosis. X-ray imaging techniques (quantitative CT and dual-energy x-ray absorptiometry) are particularly suited to measuring bone mineral density.29 The degree of osteoporosis and with it the risk of a patient to suffer spontaneous fractures is often determined by comparing bone density to an age-matched distribution.25 Today, this measurement is usually highly automated, with unsupervised segmentation of the bone examined (e.g., vertebrae or the calcaneus) and subsequent determination of the bone mineral density, measured in milligrams of calcium hydroxyapatite per milliliter of bone, from the x-ray attenuation.
Examples of classification include healthy/diseased or healthy tissue/benign lesion/malignant lesion. A typical example is the classification of suspicious masses in mammograms as benign or malignant. This classification can be based on the shape of the segmented lesion61 or on the texture.67 To illuminate the classification process, let us look at texture analysis, a process that is representative of a large number of similar approaches to classifying lesions in x-ray mammograms. Sahiner et al. propose segmenting the lesion and extracting its outline, then transforming a narrow band of pixels perpendicular to the outline onto a rectangular region where descriptive metrics can be extracted from the texture.67 In their study a total of 41 scalar values were extracted from each image using methods based on the co-occurrence matrix and run-length analysis. The 41 values formed a descriptive feature vector, and a classification scheme based on computer learning (Fischer’s linear discriminant classifier45) was used by radiologists to produce a malignancy rating from 1 to 10.10 The generation of high-dimensional feature vectors and the use of artificial intelligence methods to obtain a relatively simple decision from the feature vector is very widespread in image quantification.