Advances in Automated Restoration of Archived Video (Digital Imaging) Part 3

Global Defects

The previous defects have all been local in nature. Blotches and lines affect only certain areas of each frame, leaving other areas untouched. Random brightness fluctuation (called flicker) is common in archived sequences however, as is shake or warping of each frame. These artifacts affect the entire image. Shake occurs in archived film because of (1) warping of the film medium and (2) worn guide holes in the film transport area causing each frame to be in different locations relative to the scanning or display equipment. Flicker is caused by the degradation of the medium (ageing of the film stock), varying exposure times, or curious effects of poor standards conversion. Varying exposure time is common to hand-cranked footage, but also happens with mechanized cameras in early films or most recently with personal 8mm. The flicker artifact is still a problem today, even with the use of digital cameras. One frequent source of modern flicker comes from radiometric calibration issues. Consider for instance time-lapse sequences, where each film frame is taken at long interval rates, or sequences made from multiple cameras, like in the “inbetweening” special effect used in The Matrix (1999). These kinds of sequences usually flicker when played back because each frame has been captured under different lighting conditions and calibration settings.

Of these two artifacts, shake has received a huge amount of attention in the literature, but not for archival purposes. In fact it is because of the rise of handheld digital video cameras that handheld video shake has become a common problem and hence attracted widespread attention. Shake in archive material can be much more difficult to remove than handheld camera shake however, since nonlinear warping can indeed occur that will have little to do with the camera motion. Furthermore, in archival material, shake (or warp) and flicker often occur together, making the removal process that much more difficult. Kokaram et al. [63] in 2003 were the earliest to address these two defects in tandem if only to note that warp removal is best done after flicker removal. Work in removing camera shake can indeed be applied to archival material, and the underlying idea in both cases is to estimate the global motion of the scene in its entirety, and then filter out the random global motion components. Simple FIR (Finite Impulse Response) filters were used in [63] together with an affine model for the global motion component. Recent work by Liu et al. in 2009 [64] presents what amounts to a breakthrough in handheld camera shake removal. In that work, the 3D trajectory of the camera is estimated using structure from motion ideas, and the camera path is then smoothed in the scaled 3D space. The main contribution is the development of image warps which do not overdistort the scene when balancing the stabilization. Handheld camera shake remains a very active and important industrial topic for mobile phone video capture, but there remains a gap in the application of those ideas to archival material. Missing areas and brightness flicker will confuse a structure from motion algorithm (for instance) and the fact that the observed image instability is not solely due to camera position alone (i.e., physical distortion of the image medium is also an issue), means that there is still work to be done here. Very interestingly in 2009, Robinson et al.4 have commercialized a solution to the problem of CMOS rolling shutter. In modern CMOS (Complementary Metal-Oxide Semiconductor) devices the frame scan time is much slower than with CCD arrays. As a result a visible warp of the image can occur when the capture device is moved during capture. This is related to the archival problem in the sense that the observed “shake” is then a combination of camera motion and image warp. Their solution is based on detailed optic flow analysis of the scene but no further details are available currently. We do not address the warp/shake problem further in this section except to note that this area remains open as far as archive restoration is concerned.

FIGURE 11.13

Example of strongly localized archive flicker. On the left, the original sequence (showing a strong black diagonal swathe moving across the image from right to left), on the middle, the deflickered sequence, and on the right the corresponding flicker map.

Flicker is considered as a global artifact as it affects the entire frame but the brightness fluctuations also frequently present some spatial variations across each frame. The variations are usually very smooth but may appear as localized structures, like the black diagonal on the right of the frame in the second row (leftmost column) in Figure 11.13. These structures are a typical manifestation of severe flicker on archive footage. An extreme case of localization appears in modern footage in the case of in-scene flicker. This occurs when fluorescent lights are out of synchronization with the acquisition rate and the camera is not in the same place as the lights. If the fluorescent lights are inside a room for instance and the camera is outside the room, then only the door is flickering whereas the rest of the scene is non-flickering. As visible in Figure 11.14, due to the complexity of the scene, the flicker requires a pixel-wise granularity.

Deflicker techniques generally consist of two stages: (1) the flicker model estimation and (2) the flicker compensation that aligns the brightness level.

Flicker Models

In the general sense, the flicker artifact between two images u and v can be modeled as a mapping t on the grayscale component that depends on the pixel position x = (x, y):

The outlier term e(x) accounts for the image disparities due for instance to motion or missing data. The mapping at a particular pixel is usually modeled as being linear

in modern footage [4, 63, 65] and nonlinear in old footage [66-68]. Ideally the mapping t should be estimated at every pixel site but this would require too much computation. Since the flicker artifact is spatially smooth, the solution adopted in the literature is to interpolate the mapping using 2D polynomial [4,65], cosine [63], or spline [66] functions. Using splines for instance yields the following interpolation [66]:

where x(i) is the 2D position of the ith control point on the image, t(i) the estimated mapping at this control point, and w(x) the interpolating 2D spline.

The problem to estimate the flicker mapping is that not all of each image pair can be matched. The parts that cannot be matched are due to occlusions/uncovering because of motion or simply due to missing data (blotches/dirt/dropout) in the case of degraded film and video material. To cope with this problem, Roosmalen [4] and Yang [69] suggest detecting occluding areas based on spotting large intensity differences that cannot be explained by flicker alone. Parameter estimation is then performed only on the blocks in which there are no outliers detected. Estimates for the “missing blocks” are then generated by some suitable interpolation algorithm. Unfortunately, this method for detecting outliers fails in the presence of heavy flicker degradation.

Flicker Compensation

The second step is to find the mapping that needs to be applied on each frame to compensate for the fluctuations and thus align the brightness levels. This can be done by estimating the flicker between the current frame un and the last restored frame uR-1 and apply the mapping onto un. The brightness levels are then locked to match the levels of the first frame. To avoid error accumulations,

Roosmalen [4] relaxes the brightness stabilization by constructing the restored frame as a mixture of the locked frameand the observed image

where k is a forgetting factor usually set between 0.85 and 0.9. There is thus a trade-off between the amount of deflicker that can be handled and the propagation of errors in the restored sequence. The key solution to flicker compensation is actually to consider the problem as a filtering problem [63,66]. Instead of locking the brightness to the previous frame, it should be sought to average the brightness levels between the current frame and its past and future neighboring frames. The filtering idea [66] can be simplified as follows:

where tn,n+i is the estimated flicker mapping that aligns the brightness levels of the current frame un to match the neighboring frame un+i. The number of neighboring frames T can be up to seven frames forward and backward. This compensation method results in more stable brightness alignment and is also less dependent on a perfect flicker estimation since the compensation depends on several estimations.

Modeling

Since the earliest work on deflickering by Roosmalen et al. in 1999, many deflicker algorithms specifically targeted smooth instances of flicker. For non-parametric models, the limitation arises from the manipulation of local histograms [66, 70] which are not designed to be used at pixel resolution. For parametric methods, the limitation comes from the amount of reliable data required for the mapping estimations. Because the parametric models involve more than one parameter per mapping, the problem becomes under-determined on flat areas and intractable at pixel resolution. The generic solution proposed in [4] is to use the smoothness assumption and interpolate the mapping on these flat areas from more reliable neighboring mappings. In 2006 however, Pitie et al. [71] introduced a model that was able to deal with in-scene flicker, or flicker that varies much more quickly across the image. A good example of this is shown in Figure 11.14, and it is that work that has been the most successful at removing flicker of many different kinds. That algorithm is in use today by The Foundry.

The basis of the idea is that, instead of looking for a complicated smoothness prior, a flicker model is established with only one parameter per pixel and is still able to handle nonlinear flicker distortions. To arrive at the model, consider that a pixel is only affected by a percentage of the original flicker source:

This model can be understood by assuming that in in-scene flicker, the light which is the source of flicker has a global impact of to (u (x)), but that due to the scene geometry, a particular point can only receive a percentage a(x) of this light. The complexity of the problem is then dramatically reduced because the mapping estimation is done globally, whilst the local variations are modeled with only one parameter α. Thus, provided that the flicker can be derived from only one source, the problem comes down to the estimation of one parameter per pixel, which yields a fully determined problem. In practice this model needs to be re-parameterized before it can be useful. For details the reader is directed to Pitie et al. [71]. Figures 11.13 and 11.14 show results of deflickering with this idea as well as the resulting flicker map α(·).

FIGURE 11.14

Example of in-scene flicker localization due to out of sync fluorescent lighting. On the left, the original sequence (note the fluctuations in the color of the window from frame to frame), in the middle, the deflickered sequence, and on the right the corresponding flicker map. The scene geometry is too complex to consider the flicker as having smooth variations.

An Evolving Industry

In a sense, the industry for digital visual manipulation is divided into two categories. The television broadcasters are mainly interested in real-time processing, while post-production houses (both film and video) who edit and apply effects to movies, are more interested that the picture is as good as possible. That means users in post-production tend to prefer general purpose hardware running various software tools for editing and compositing and hence are usually not concerned with real-time processing. Interestingly enough, it is television broadcast hardware producers that had started in the late 1980s to produce restoration systems. The BBC used their hardware (Debra) for dirt and noise reduction in-house in the late 1980s and it was real time built out of discrete logic devices (it was quite large indeed). Digital Vision www.digitalvision.se launched a PC-based system for color correction and dirt/noise reduction in 1988 that exploited dedicated hardware, again allowing realtime operation. Around 1997, Snell and Wilcox launched Archangel, dedicated hardware specifically targeting motion-compensated, real-time noise and dirt removal. Significantly, Archangel was the first hardware-based restoration system to arise from an EU Research Project: AURORA (19941998). Terranex Systems teranexlive.dimentians.com is a more recent arrival (^2002) using a massively parallel array of processors on a chip to create single hardware units that allow real-time noise/dirt/scratch concealment. The algorithms implemented in hardware tend to be deterministic in nature; clever use of motion-compensated filtering combined with simple decision making over multiframes. S&W were the first to introduce a real-time hardware line scratch removal system which was very successful.

It was in 1997 that the first software-based restoration systems emerged for film post-production. Four appeared almost simultaneously. Lowry Digital was the first to use a massive network of Apple Macs to achieve fast throughput and high-quality film restoration. They used their own software-designed systems and essentially were a high-quality post-production unit for restoration. DUST, established in France, took a similar approach. MTIFilm (www.mtifilm.com) was the first dedicated software system to appear for restoration, marketed as a restoration system to post-production houses and film studios. They therefore hold the accolade of being the first to design a professional restoration interface in software that showed a timeline, before/after and so on. DaVinci/Digital Revival, emerging from a collaboration with Cambridge University in 1996, followed shortly after offering software for restoration on networks of Linux machines. HS-ART released their Diamant5 software restoration system about the same time. Diamant was also a result of an EU collaboration and driven by the needs of archivists; it is perhaps the only self-contained software system for editing and automated restoration available today. The Diamant product introduced an interesting innovation to show a representation of the entire movie, using horizontal projections of each image stacked horizontally. This proved surprisingly useful for restoration since flicker and shake in particular can easily be spotted with this representation. It is worthwhile to note that film scanner manufacturers like Philips, Sony, Imagica, and Thompson all incorporate some level of software-based dirt and noise reduction in their systems today.

Since 2003, however, the software restoration space has become more interesting. The increasing speed of PCs and the large repositories of video found in communities like GoogleVideo and YouTube, implies that the need for video manipulation has become more mainstream. Software plug-ins that enable restoration for consumer and professional software platforms like After Effects, Flame, Shake, and FinalCutPro, are now available from Adobe Systems, Autodesk, The Foundry, RedGiantSoftware, and GreenParrotPictures. The PixelFarm has recently also joined the bandwagon of post-production software manufacturers that have seen the growing niche of restoration systems attractive. The Foundry and RedGiantSoftware together with GreenParrotPictures were the first to use algorithms derived from the use of Markov Random Field priors in dustbusting and motion interpolation. The continuing advance of high-definition television in 2009 seems also to be driving the demand not only for restoration but also for resolution improvement. HDTV sets are widely available, as is HD broadcasting, and this is being coupled with the rapid proliferation of the Blu-ray HD format. Hence viewers can see defects much more readily than before and this drives the need for better quality pictures. The growth in the industry therefore seems clear and there is no reason to suspect that the demand for restoration will lessen in the near future.