Information Technology Reference
In-Depth Information
Datta et al. ( 2006 ) employed colour, texture, shape and composition, high-level
ad-hoc features and a support vector machine to classify images gathered from a
photography portal ( photo.net ). The dataset included 3581 images. All the images
were evaluated by at least two persons. Unfortunately, the statistical information
from each image, namely number of votes, value of each vote, etc. is not avail-
able. Similarly to previous approaches, they considered two image categories: the
highest rated images (average aesthetic value
5.8, a total of 832 images) and the
lowest rated ones (
4.2, a total of 760 images), according to the ratings given by
the users of the portal. Images with intermediate scores were discarded. Datta's jus-
tification for making this division is that photographs with an intermediate value
“are not likely to have any distinguishing feature, and may merely be representing
the noise in the whole peer-rating process” (Datta et al. 2006 ). The system obtained
70.12 % classification accuracy. The authors published the original dataset of this
experiment, allowing future comparisons with other systems.
Wong and Low ( 2009 ) employed the same dataset, but selected the 10 % of the
highest and lowest rated images. The authors extracted the salient regions of images,
with a visual saliency model. They used global metrics related to sharpness, contrast,
luminance, texture details, and low depth of field; and features of salient regions
based on exposure, sharpness and texture details. Using a support vector machine
classifier they obtained a 78 % 5-fold cross-validation accuracy.
In order to create a basis for research on aesthetic classification, Datta et al.
( 2008 ) proposed three types of aesthetic classification: aesthetic score prediction;
aesthetic class prediction and emotion prediction. All the experiments explained in
this section rely on aesthetic class prediction. He also published four datasets: the
one employed in Datta et al. ( 2006 ), and 3 other extracted from photo.net (16,509
images), dpchallenge.com (14,494 images) and “Terragalleria” (14,494 images). 3
These three datasets include information regarding the number of votes per image
and “score” (e.g. number of users that assigned a vote of “2” to image “id454”).
Moreover, a dataset is included from the website “Alipr” with 13,100 emotion-
tagged images.
Although not within the visual field, it is worth mentioning the work carried
out by Manaris et al. ( 2007 ) in which a system was trained to distinguish between
popular (high number of downloads) and unpopular classical music (low number of
downloads). The dataset was obtained from downloads of the website Classical Mu-
sic Archive ( http://www.classicalarchives.com ) in November 2003. Two sets, with
high and low number of downloads, were created, in a similar way to the previously
mentioned works. The “popular” set contained 305 pieces, each one with more than
250 hits, while the “not popular” contained 617 pieces with less than 22 downloads.
The system is based on a set of metrics based on Zipf's Law applied to musical con-
cepts such as pitch, duration, harmonic intervals, melodic intervals, harmonic con-
sonance, etc. The classification system is based on an artificial neural network. The
success rate was 87.85 % (it classified correctly 810 out of 922 instances), which
3 Available from http://ritendra.weebly.com/aesthetics-datasets.html .
Search WWH ::




Custom Search