Computing Aesthetics with Image Judgement Systems - Computers and Creativity

Information Technology Reference

In-Depth Information

11.2.2 User Evaluation and Popularity Prediction

The most obvious way of validating an AJS (at least one with learning capacities)

may be to employ a set of images pre-evaluated by humans. The task of the AJS

is to classify or “to assign an aesthetic value to a series of artworks which were

previously evaluated by humans” (Romero et al. 2003 ).

There are several relevant papers published in the image processing and computer

vision research literature that are aimed at the classification of images based on

aesthetic evaluation. Most of them employed datasets obtained from photography

websites. Some of those datasets are public, so they allow testing of other AJSs. In

this section we perform a brief analysis of some of the most prominent works of this

type.

Ke et al. ( 2006 ) proposed the task of distinguishing between “high quality profes-

sional photos” and “low quality snapshots”. These categories were created based on

users' evaluations of a photo website, so, to some extent, this can be considered as a

classification based on aesthetic preference. The website was the dpchallenge.com

photography portal, and they used the highest and lowest rated 10 % images from a

set of 60,000 in terms of average evaluation. Each photo was rated by at least 100

users. Images with intermediate scores were not considered.

The authors employed a set of high-level image features (such as spatial distri-

bution of edges, colour distribution, blur, hue count) and a support vector machine

classification system, obtaining a correct classification rate of 72 %. Using a combi-

nation of these metrics with those published by Tong et al. ( 2004 ), Ke et al. ( 2006 )

achieved a success rate of 76 %.

Luo and Tang ( 2008 ) employed the same database. The 12,000 images of the

dataset are accessible online 2 allowing the comparison of results. Unfortunately,

neither the statistical information of the images (number of evaluations, average

score, etc.) nor the images with intermediate ratings are available. The dataset is

divided into two sets (training and test), made up of 6,000 images each. The authors

state that these sets were randomly created. However, when one reverses the role of

the test and training sets (i.e. training with original “test” set and testing with the

original “training” set) the results differ significantly. This result indicates that the

test and training set are not well-balanced.

Additionally, Luo and Tang ( 2008 ) used a blur filter to extract the background

and the subject from each photo. Next, they employed a set of features related to

clarity contrast (the difference between the crispness of the subject region and the

background of the photo), lighting, simplicity, composition and colour harmony.

They obtained a 93 % success rate using all features, which clearly improved upon

previous results. The “clarity contrast” feature alone yields a success rate above

85 %. The authors pointed out that the difference between those results and the ones

obtained by Ke et al. ( 2006 ) can be derived from the application of metrics to the

image background regions and to the greater adequacy of the metrics itself.

Search WWH ::

Custom Search

Home