Feature Extraction (Media Processing) (Video Search Engines)

Feature extraction implies processing series of media samples (signals) and creating more efficient representations that will be eventually useful in deriving meaning about the content represented by the signals. Generally for media processing, this achieves a large data reduction, sometimes on the scale of several orders of magnitude. It is typically not practical or useful for all stages of media processing to operate directly on the samples that are intended primarily for regenerating the signal. The extracted features can be straight statistical measures such as mean or moments, but in many cases the features are intended to model human perception or physiology so more advanced transforms are used (e.g. Mel Frequency Cepstral Coefficients). Note that the task of reducing data while retaining perceptually meaningful information is not only the goal of this stage of media processing, but is also the goal of media compression. Therefore, in many cases, the same features are used and the theoretical basis as well as the algorithms and implementations can be re-used. In fact, many practical media processing algorithms are designed to operate in the compressed domain. While the features optimized for compression are not necessarily optimal for analysis, they are reasonably good and a large measure of systems efficiency can be realized by their adoption to provide double duty in analysis as well as compression. Although we mentioned the importance of data reduction, often the first operation for generating features can be a transformation, e.g. from the time / spatial domain to the frequency domain which is not necessarily inherently lossy. It is just that in this new space it may become more straightforward to truncate features in such a way that, when the inverse transform is preformed, results in minimal perceived signal degradation. Thus we can speak of a feature space in which samples are represented as feature vectors. Typically we would like to keep the dimensionally of the vectors as small as possible to improve system efficiency and perhaps generalizability, but in some cases high dimensionality is not an insurmountable problem, particularly when the feature vectors are sparse. One notable exception to the notion of data reduction is the case of query expansion, where there is typically a paucity of features – the source data may be a single word entered by a user – so it is desirable to appeal to ancillary data sources in an attempt to create additional features that may capture the intent of the query.


The problem of feature selection arises simply because it is generally easier to generate features than to determine which features have value for a desired application. It may not be computationally practical to use all extracted features, and in fact using all features may have a detrimental effect on the accuracy of the results.

In algorithm design, the invariance of the features must be taken into account. In image processing, scale, rotation and translation invariance are generally desirable. However, there may be limits imposed. For example, we may build a face detection system based on spatial relations of low level features, and then use the presence, location and orientation of the detected faces as higher level features for later processing in story segmentation. We desire our face detector to be invariant to slight rotations of the face about an axis perpendicular to the image plane, but not if that rotation is on the order of 180 degrees (where the detector reports that the face is upside down – this is either an error or a situation of no relevance).

Next post:

Previous post: