Information Technology Reference
In-Depth Information
Fig. 4.3 A high-level system pipeline for motif discovery. An ML model is trained on pre-processed
music themes. Pitch detection is performed on an audio file or edge detection is performed on an
image file in order to extract a sequence of notes. The sequence of notes is segmented into a set of
candidate motifs, and only the most probable motifs according to the ML model are selected
4.2.2.2 Audio Pitch Detection
Pitch detection is performed on the audio file using an open source command line
utility called Aubio, 2 which combines note onset detection and pitch detection to
output a string of notes (each comprised of a pitch and duration). The string of detected
notes is post-processed to make the sequence more manageable: each duration is
quantized to the nearest 32nd note value, and pitch intervals that are larger than an
octave are modified to the equivalent interval that is less than an octave.
4.2.2.3 Image Edge Detection
Edge detection is performed on an image using a Canny edge detector, 3 which returns
a new image comprised of black and white pixels. The original image is also converted
to a greyscale image. To extract strings of notes analogous to those extracted from
audio, both images are iterated over one pixel at a time using a spiral pattern starting
from the outside and working inward. For each sequence of b contiguous black pixels
(delimited by white pixels) in the edge-detected image, a single note is created. The
pitch of the note is the average intensity of the corresponding b pixels in the greyscale
image, and the duration of the note is proportional to b .
4.2.2.4 Motif Discovery
After the string of notes is detected and processed, candidate motifs are extracted
(see Algorithm 1). All contiguous motifs of length greater than or equal to l _ min
2
http://www.aubio.org .
3
http://www.tomgibara.com/computer-vision/canny-edge-detector .
Search WWH ::




Custom Search