Computing Aesthetics with Image Judgement Systems - Computers and Creativity

Information Technology Reference

In-Depth Information

was considered promising by the authors. The same approach could be applied to

images if we use the number of times an image is downloaded or the number of hits

of its high-resolution version.

All these works rely on the use of photography and artistic websites. While these

sites provides large datasets created by a third party, which should minimise the

chances of being biased, the approach has several shortcomings for the purposes of

AJS validation.

The experimental environment (participants and methodology) is not as con-

trolled as in a psychological test, and several exogenous factors may influence the

image scores. It is not possible to have all the information about the people and the

circumstances in which they participated. The personal relations between users may

affect their judgement. The same person may cast more than one vote, and so on.

It is also difficult to know what the users are evaluating when they vote. At

photo.net the users can classify each image according to its “aesthetic” and “origi-

nality”, however these scores are highly correlated (Datta et al. 2006 ), which indi-

cates that users were not differentiating between these criteria. Since the selection

of images is not under the control of the researcher, the aesthetic evaluation can be

highly influenced by the semantics of content, novelty, originality and so on. These

websites include some level of competition (in fact dpchallenge.com is a contest),

so the possibilities of some biased votes is even higher.

The interpretation of the results obtained by an AJS in this kind of test is not

straightforward. Different datasets have different levels of difficulty. As such, a per-

centage of correct answers of, e.g. 78 % can be a good or a bad score. As such, the

comparison with the state of the art becomes of huge importance. Additionally, it

may also be valuable to consider the difficulty of the task for humans. Thus, estimate

the discrepancy between the success rate of the AJS and the success rates obtained

by humans. Although this is not possible for the previously mentioned datasets, if

the dataset includes all the voting information, one can calculate the agreement be-

tween humans and the AJSs. In other words, check if the response of the AJS is

within the standard deviation for human responses.

For the purposes of AJS validation, the dataset should neither be trivial nor al-

low shortcuts that enable the system to perform the task exploiting properties of

the artefacts which are not related with the task. Teller and Veloso ( 1996 ) discov-

ered that their genetic programming approach to face recognition was identifying

subjects based on the contents of the background of images (the photographs had

been taken in different offices) instead of on the faces. The same type of effect may

happen in aesthetic judgement test unless proper measures are taken. For instance,

good photographers tend to have good cameras and take good photographs. A sys-

tem may correctly classify photographs by recognising a good camera (e.g. a high

resolution one) instead of recognising the aesthetic properties of the images. Thus,

it is necessary to take the appropriate precautions to avoid this type of exploitation

(e.g. reducing all the images to a common resolution before they are submitted to

the classifier). This precaution has been taken in the works mentioned in Sect. 11.3

of this chapter. Nevertheless, it is almost impossible to ensure that the judgements

are made exclusively on aesthetic properties.

Computers and Creativity

Search WWH ::

Custom Search

Home