Information Technology Reference
In-Depth Information
was considered promising by the authors. The same approach could be applied to
images if we use the number of times an image is downloaded or the number of hits
of its high-resolution version.
All these works rely on the use of photography and artistic websites. While these
sites provides large datasets created by a third party, which should minimise the
chances of being biased, the approach has several shortcomings for the purposes of
AJS validation.
The experimental environment (participants and methodology) is not as con-
trolled as in a psychological test, and several exogenous factors may influence the
image scores. It is not possible to have all the information about the people and the
circumstances in which they participated. The personal relations between users may
affect their judgement. The same person may cast more than one vote, and so on.
It is also difficult to know what the users are evaluating when they vote. At
photo.net the users can classify each image according to its “aesthetic” and “origi-
nality”, however these scores are highly correlated (Datta et al. 2006 ), which indi-
cates that users were not differentiating between these criteria. Since the selection
of images is not under the control of the researcher, the aesthetic evaluation can be
highly influenced by the semantics of content, novelty, originality and so on. These
websites include some level of competition (in fact dpchallenge.com is a contest),
so the possibilities of some biased votes is even higher.
The interpretation of the results obtained by an AJS in this kind of test is not
straightforward. Different datasets have different levels of difficulty. As such, a per-
centage of correct answers of, e.g. 78 % can be a good or a bad score. As such, the
comparison with the state of the art becomes of huge importance. Additionally, it
may also be valuable to consider the difficulty of the task for humans. Thus, estimate
the discrepancy between the success rate of the AJS and the success rates obtained
by humans. Although this is not possible for the previously mentioned datasets, if
the dataset includes all the voting information, one can calculate the agreement be-
tween humans and the AJSs. In other words, check if the response of the AJS is
within the standard deviation for human responses.
For the purposes of AJS validation, the dataset should neither be trivial nor al-
low shortcuts that enable the system to perform the task exploiting properties of
the artefacts which are not related with the task. Teller and Veloso ( 1996 ) discov-
ered that their genetic programming approach to face recognition was identifying
subjects based on the contents of the background of images (the photographs had
been taken in different offices) instead of on the faces. The same type of effect may
happen in aesthetic judgement test unless proper measures are taken. For instance,
good photographers tend to have good cameras and take good photographs. A sys-
tem may correctly classify photographs by recognising a good camera (e.g. a high
resolution one) instead of recognising the aesthetic properties of the images. Thus,
it is necessary to take the appropriate precautions to avoid this type of exploitation
(e.g. reducing all the images to a common resolution before they are submitted to
the classifier). This precaution has been taken in the works mentioned in Sect. 11.3
of this chapter. Nevertheless, it is almost impossible to ensure that the judgements
are made exclusively on aesthetic properties.
Search WWH ::




Custom Search