ISMIR2020 has been a great conference for me. As a second-time attendee, it was very close to my last year’s experience in terms of new inspiring projects and ideas. Although, of course, I missed the social aspect very much, and the opportunity to visit Montreal!
One of my highlights was Safiya Noble’s keynote in which she puts Big Tech into context with other historical lobbies such as Big Cotton or Big Tobacco. It was very nice to see that the community cares about the ethical implications of the new technologies we are helping to craft.
I also enjoyed the session on Validity in MIR with Arthur Flexer, Bob Sturm and Julián Urbano. Many times we see how models that perform well in the lab fail when we free them into the wild. This session offered a discussion about experimental design choices and statistical approaches to validate our research better. I think some of these ideas are going to be very relevant for my first steps in the Ph.D.
Now, after a week, I have finished processing all the information. I have come up with a selection of the papers that I enjoyed the most. Here they are listed just in the order they appeared in my notes:
Semi-supervised Learning Using Teacher-student Models for Vocal Melody Extraction
Sangeun Kum, Jing-Hua Lin, Li Su and Juhan Nam
An application of the teacher-student paradigm for melody extraction. Training in predictions generated by a supervised teacher network helps a (noisy) student to improve performance on unseen datasets.The Freesound Loop Dataset and Annotation Tool
Antonio Ramires, Frederic Font, Dmitry Bogdanov, Jordan B. L. Smith, Yi-Hsuan Yang, Joann Ching, Bo-Yu Chen, Yueh-Kao Wu, Hsu Wei-Han and Xavier Serra
A multi-class dataset for audio loops with an online annotation tool. In our late-breaking demo [1], we used a single-label subset of it to train a loop musical role classifier. If you like this work, you can help with the ongoing annotation process!Can’t Trust the Feeling? How Open Data Reveals Unexpected Behavior of High-level Music Descriptors
Cynthia C. S. Liem and Chris Mostert
Interesting work assessing the AcousticBrainz SVM models. The conclusion is pretty aligned with previous work evidencing the lack of generalization of those models [2]. This work goes further and points to certain aspects that correlate with the predictions, such as the Essentia version used to extract the features, the audio codec, or the bit depth.Metric Learning vs Classification for Disentangled Music Representation Learning
Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin and Juhan Nam
This work compares metric learning, classification, and proxy-based models with and without an embedding disentanglement mechanism in various downstream tasks. The models based on classification tasks are superior, and the disentanglement helps to achieve a new state of the art for auto-tagging.Drumgan: Synthesis of Drum Sounds with Timbral Feature Conditioning Using Generative Adversarial Networks
Javier Nistal, S. Lattner and G. Richard
Really nice sounding drum hits generated with GANs and conditioned on AudioCommons’ timbral attributes. The sound quality doesn’t degrade even under extreme conditioning parameters. According to the author, this may be due to noisy input, which prevents the model from overreacting to unseen conditioning values.Ultra-light Deep MIR by Trimming Lottery Tickets
Philippe Esling, Theis Bazin, Adrien Bitton, Tristan Carsault and Ninon Devis
A simple technique that allows significant model compression in terms of compute. Very relevant for the deployment of real-time audio/music models.Essentia.js: a JavaScript Library for Music and Audio Analysis on the Web
Albin Correya, Dmitry Bogdanov, Luis Joglar-Ongay and Xavier Serra
Work produced by my colleges that obtained the Best Reproducibility award! It presents the integration of Essentia into JavaScript, which opens the door to scalable, serverless MIR in the browser.Deconstruct, Analyse, Reconstruct: How to Improve Tempo, Beat, and Downbeat Estimation
Sebastian Böck and Matthew Davies
This work revisits a previous beat-tracking model facing it as a multi-task problem, enhancing the CNN architecture, and applying data augmentation to achieve a new state of the art in the task.Multilingual Music Genre Embeddings for Effective Cross-lingual Music Item Annotation
Elena V. Epure, Guillaume Salha and Romain Hennequin
They learn multilingual genre embeddings without a parallel corpus. The method proves to be useful for translating the genres between different languages and taxonomies. I believe that this technique can be convenient for combining genre datasets with slightly divergent taxonomies.Joyful for You and Tender for Us: the Influence of Individual Characteristics and Language on Emotion Labeling and Classification
Juan Sebastián Gómez-Cañón, Estefanía Cano, Perfecto Herrera and Emilia Gómez
In MIR, we typically train emotion-related classifiers assuming that the labelsad
in a music track is as objective as the labelapple
in a picture from ImageNet. This work finds that your familiarity with the piece, your ability to understand lyrics, your musical knowledge, or your mother tongue have an impact on how you perceive emotion labels, especially for the most complex ones. I find this work very relevant in the context of our mood classifiers [1,3].“Butter Lyrics Over Hominy Grit”: Comparing Audio and Psychology-based Text Features in MIR Tasks
Jaehun Kim, Andrew M. Demetriou, Sandy Manolios, M. Stella Tavella and Cynthia C. S. Liem
Interesting work evaluating psychology-inspired features extracted from the lyrics on common MIR tasks showing promising results. I believe that the message of lyrics (or even its aesthetic codes) are a powerful (yet cheap) source of information to explain many of the human ways of understanding music. Nice to see work in that direction.
References
[1] Alonso-Jiménez, P., Bogdanov, D., & Serra, X. (2020). Deep embeddings with Essentia models. In 21th Conference of the International Society for Music Information Retrieval (ISMIR2020). – Late-breaking demo
[2] Bogdanov, D., Porter, A., Boyer, H., & Serra, X. (2016). Cross-collection evaluation for music classification tasks. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR2016).
[3] - Alonso-Jiménez, P., Bogdanov, D., Pons, J., & Serra, X. (2020). TensorFlow Audio Models in Essentia. In 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2020).