Publications
Peer Reviewed Articles
Ghosh, S., Kiran, C., Evuru, R., Kumar, S., Tyagi, U., Nieto, O., Jin, Z., Manocha, D., Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs, In review for the 13th International Conference on Learning Representations (ICLR), 2025 (PDF).
Manco, I., Salamon, J., Nieto, O., Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning, In Proc. of the 25th International Society for Music Information Retrieval Conference (ISMIR), San Francisco, CA, USA, 2024 (PDF).
Ghosh, S., Kumar, S., Seth, A., Kiran, C., Evuru, R., Tyagi, U., Sakshi, S., Nieto, O., Duraiswami, R., Manocha, D., GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities, In the 19th Proc. of Empirical Methods in Natural Language Processing Conference (ENMLP), Miami, Florida, USA, 2024 (Top 5% conference paper (oral), PDF).
Ghosh, S., Seth, A., Kumar, S., Tyagi, U., Evuru, C. K., Ramaneswaran, S., Sakshi, S., Nieto, O., Duraiswami, R., Manocha, D., CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models, Proc. of the 12th International Conference on Learning Representations (ICLR). Vienna, Austria, 2024 (PDF).
Wilkins, J., Salamon, J., Fuentes, M., Bello, J. P., Nieto, O., Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries, Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, USA, 2023 (PDF).
Nieto, O., Jin, Z., Dernoncourt, F., Salamon, J., Efficient Spoken Language Recognition via Multilabel Classification, Proc. of the 24th InterSpeech Conference. Dublin, Ireland, 2023 (PDF).
Tan, R., Burns, A., Ray, A., Plummer, B. A., Nieto, O., Salamon, J., Russell, B., Saenko, K., Language-Guided Audio-Visual Source Separation via Trimodal Consistency, Proc. of the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR). Vancouver, BC, Canada, 2023 (Top 10% conference paper (highlighted), PDF).
Wu, H., Nieto, O., Bello, J.P., Salamon, J., Audio-Text Models Do Not Yet Leverage Natural Language, Proc. of 48th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Rhodes, Greece, 2023 (PDF).
Kandpal, N., Nieto, O., Jin, Z., Music Enhancement Via Image Translation and Vocoding, Proc. of 47th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Singapore, 2022 (PDF).
Salamon, J., Nieto, O., Bryan, N. J., Deep Embeddings and Section Fusion Improve Music Segmentation, Proc. of 22nd International Society for Music Information Retrieval Conference (ISMIR). pages 594-601. Online, 2021 (PDF).
Won, M., Oramas, S., Nieto, O., Gouyon, F., Serra, X., Multimodal Metric Learning for Tag-Based Music Retrieval, Proc. of 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Canada, 2021 (PDF).
Nieto, O., Mysore, G.J., Wang, C.-. i ., Smith, J.B.L., Schlüter, J., Grill, T. and McFee, B., 2020. Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications. Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1), 2020, pages 246–263. DOI: http://doi.org/10.5334/tismir.54 (PDF).
Korzeniowksi, F., Nieto, O., McCallum, M., Won, M., Oramas, S., Schmidt, E., Mood Classification Using Listening Data, Proc. of 21st International Society for Music Information Retrieval Conference (ISMIR). Montreal, Quebec, Canada, 2020 (PDF).
Won, M., Chun, S., Nieto, O., Serra, X., Data-Driven Harmonic Filters For Audio Representation Learning, Proc. of 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, 2020 (PDF).
Nieto, O., McCallum, M., Davies., M., Robertson, A., Stark, A., Egozy, E., The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music, Proc. of the 20th International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands, 2019 (PDF).
Ren, I. Y., Nieto, O., Hendrik, V. K., Volk, A., Swierstra, W., Investigating Musical Pattern Ambiguity in a Human Annotated Dataset, Proc. of the 15th International Conference on Music Perception and Cognition (ICMPC), Graz, Austria, 2018 (PDF).
Pons, J., Nieto, O., Prockup, M., Schmidt, E., Ehmann, A., Serra, X., End-to-End Learning for Music Audio Tagging at Scale. Proc. of the 19th International Society for Music Information Retrieval Conference (ISMIR). Paris, France, 2018 (Best Student Paper, PDF).
Oramas, S., Barbieri, F., Nieto, O., Serra, X., Multimodal Deep Learning for Music Genre Classification. Transactions of the International Society for Music Information Retrieval (TISMIR). 2018 (PDF).
Ebrahimi, S., Hossein, V., Prockup, M., Nieto, O., Predicting Audio Advertisement Quality. In Proc. of the 11th ACM International Conference on Web Search and Data Mining (WSDM). Marina Del Rey, CA, USA 2018 (PDF).
Pons, J., Nieto, O., Prockup, M., Schmidt, E., Ehmann, A., Serra, X., End-to-end Learning for Music Audio Tagging At Scale, Machine Learning for Audio Signal Processing Workshop at NIPS, Long Beach, CA, USA 2017 (PDF).
Oramas, S., Nieto, O., Sordo, M., Serra, X., A Deep Multimodal Approach for Cold-start Music Recommendation. Deep Learning for Recommender Systems Workshop, RecSys, Como, Italy 2017 (PDF).
McFee, B., Nieto, O., Farbood, M., Bello, J. P., Evaluating Hierarchical Structure In Music Annotations. Front. Psychol. 8:1337. doi: 10.3389/fpsyg.2017.01337, 2017 (PDF).
Oramas, S., Nieto, O., Barbieri, F., Serra, X., Multi-label Music Genre Classification From Audio, Text, and Images Using Deep Features. Proc. of the 18th International Society for Music Information Retrieval Conference (ISMIR). Suzhou, China, 2017 (Best Oral Presentation, PDF).
Nieto, O., Bello, J. P., Systematic Exploration Of Computational Music Structure Research. Proc. of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York City, NY, USA, 2016 (PDF, Slides).
McFee, B., Nieto, O., Bello, J. P., Hierarchical Evaluation of Segment Boundary Detection. Proc. of the 16th International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015 (PDF, Poster).
McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., Nieto, O., LibROSA: Audio and Music Signal Analysis in Python. Proc. of the 14th Python in Science Conference. Austin, TX, USA, 2015 (PDF).
Nieto, O., Farbood, M., Identifying Polyphonic Musical Patterns From Audio Recordings Using Music Segmentation Techniques. Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014 (PDF, Slides).
Nieto, O., Farbood, M., Jehan, T., Bello, J.P., Perceptual Analysis of the F-Measure to Evaluate Section Boundaries in Music. Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014 (PDF, Poster).
Raffel, C., McFee, B., Humphrey, E., Salamon, J., Nieto, O., Liang, D., Ellis, D., mir_eval: A Transparent Implementation of Common MIR Metrics. Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014 (Best Poster Presentation, PDF).
Humphrey, J.E., Salamon, J., Nieto, O., Forsyth, J., Bittner, R., Bello, J.P., JAMS: A JSON Annotated Music Specification for Reproducible MIR Research. Proc. of the 15th International Society for Music Information Retrieval Conference (ISMIR). Taipei, Taiwan, 2014 (PDF).
Ballús, A., Arnau, E., Nieto, O., Font, F., Torrents, A. G., Embodying Theoretical Research in Music Cognition: Four Proposals for Theory-Driven Experimentation. In Proc. of the 13th International Conference on Music Perception and Cognition. Seoul, South Korea, 2014 (PDF, Poster).
Nieto, O., Bello, J.P., Music Segment Similarity Using 2D-Fourier Magnitude Coefficients. Proc. of the 39th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Florence, Italy, 2014 (PDF).
Nieto, O., Shasha, D., Hand Gesture Recognition in Mobile Devices: Enhancing The Musical Experience. Proc. of the 10th International Symposium on Computer Music Multidisciplinary Research (CMMR). Marseille, France, 2013 (PDF).
Humphrey, E. J., Nieto, O., Bello, J. P., Data Driven and Discriminative Projections for Large-Scale Cover Song Identification. Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013 (PDF).
Park, T.H., Crawford, L., Nieto, O., Even More Tactile Feedback for Mobile Devices. Proc. of the 39th International Computer Music Conference (ICMC), Perth, Australia, 2013 (PDF).
Park, T.H., Nieto, O., Fortissimo: Force-Feedback for Mobile Devices. Proc. of the 13th International Conference on New Interfaces for Musical Expression (NIME), Kaist, South Korea, 2013 (PDF).
Nieto, O., Unsupervised Clustering of Extreme Vocal Effects. Proc. of the 10th International Conference Advances in Quantitative Laryngology, Voice and Speech Research (AQL), pages 115-116. Cincinnati, OH, USA, 2013 (PDF).
Nieto, O., Jehan, T., Convex Non-negative Matrix Factorization For Automatic Music Structure Identification. Proc. of the 38th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vancouver, Canada, 2013 (PDF).
Nieto, O., Humphrey, E. J., Bello, J. P., Compressing Music Recordings into Audio Summaries. Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR), pages 313-318. Porto, Portugal, 2012 (PDF).
Nieto, O., Farbood, M., Perceptual Evaluation of Automatically Extracted Musical Motives. Proc. of the 12th International Conference on Music Perception and Cognition (ICMPC), pages 723-727. Thessaloniki, Greece, 2012 (PDF).
Algorithms
Nieto, O., MIREX 2016 Entry: MSAF V0.1.0 Submission, New York City, NY, USA, 2016 (PDF, code).
Nieto, O., Bello, J.P., MIREX 2014 Entry: 2D Fourier Magnitude Coefficients. Music Information Retrieval Evaluation eXchange (MIREX), Taipei, Taiwan, 2014 (PDF, code).
Nieto, O., Farbood, M., MIREX 2014 Entry: Music Segmentation Techniques and Greedy Path Finder Algorithm to Discover Musical Patterns. Music Information Retrieval Evaluation eXchange (MIREX), Taipei, Taiwan, 2014 (PDF, code).
Nieto, O., Jehan, T., MIREX 2014 Entry: Convex Non-negative Matrix Factorization. Music Information Retrieval Evaluation eXchange (MIREX), Taipei, Taiwan, 2014 (PDF, code).
Nieto, O., Farbood, M., MIREX 2013: Discovering Musical Patterns Using Audio Structural Segmentation Techniques. Music Information Retrieval Evaluation eXchange (MIREX), Curitiba, Brazil, 2013 (PDF).
Theses
Nieto, O., Discovering Structure in Music: Automatic Approaches and Perceptual Evaluations. New York University. PhD Dissertation, 2015 (Slides, Defense Video, PDF).
Nieto, O., Voice Transformations for Extreme Vocal Effects. Pompeu Fabra University. Master's Thesis, 2008 (PDF).
Nieto, O., Desenvolupament Open Source per a E-Learning-II. Polytechnic University of Catalonia. Undergrad's Thesis, 2007 (PDF).
Selected Talks
Nieto, O., Overview, Challenges, and Applications of Audio-based Music Structure Analysis. Women in Music Information Retrieval Workshop (ISMIR). Virual, 2021 (Slides).
Nieto, O., Music Recommendation with Waveform-based Architectures. 4th Global AI Conference. Santa Clara, CA, USA 2020 (Slides).
Nieto, O., Spectral Analysis and Detection of Extreme Vocal Effects (with CNNs). Research Seminar. Universitat Pompeu Fabra. Barcelona, Spain, 2019 (Slides).
Nieto, O., Spectral Analysis and Detection of Extreme Vocal Effects. 2nd International Symposium on Distorted Voices. São Paulo, Brazil, 2019 (Slides).
Nieto, O., Recommending Music with Waveform Architectures at Scale (Extended Version). Seminar Series in Data Science. University of San Francisco, San Francisco, CA, USA, 2019 (Slides).
Nieto, O., Recommending Music with Waveform Architectures at Scale. Deep Learning Barcelona Symposium. Pompeu Fabra University, Barcelona, 2018 (Slides - Video).
Nieto, O., Cold-Start Music Recommendation Using Multimodal Deep Architectures. Systematic approaches to deep learning methods for audio. Erwin Schrödinger Institute, University of Vienna, Austria, 2017 (PDF).
Nieto, O., Long Tail Music Recommendation Using Deep Architectures. International Workshop on Deep Learning for Music (International Joint Conference on Neural Networks). Anchorage, AK, USA, 2017 (PDF).
Nieto, O., Deep Learning for Large-Scale Music Recommendation. Data-Driven Research in Music Cognition. Stanford University, Stanford, CA, USA, 2017 (PDF).
Nieto, O., Deep Learning for Music Recommendation: Machine Listening and Collaborative Filtering. Seminar on Music Knowledge Extraction Using Machine Learning. Pompeu Fabra University, Barcelona, Spain, 2016 (PDF).
Nieto, O., Deep Learning for Large Scale Music Recommendation. Biostat Seminar. Stanford, CA, USA, 2016 (PDF).
Nieto, O., Farbood, M., Multiple Annotations and Subjectivity in the Identification of Segment Boundaries in Music. Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2014 (PDF).
Nieto, O., Bello, J.P., Music Segment Similarity Using 2D-Fourier Magnitude Coefficients. North East Music Information Special Interest Group (NEMISIG). New York, NY, USA, 2014 (PDF).
Nieto, O., Farbood, M., Bello, J.P., A Perceptually Based Evaluation of Music Boundaries. Cognitive Music Information Retrieval (CogMIR). Toronto, ON, Canada, 2013 (PDF).
Nieto, O., Music Structure Analysis and New Musical Interfaces. Pompeu Fabra University, Barcelona, Spain, 2013 (PDF).
Nieto, O., Jehan, T., Music Structure Analysis by Matrix Factorization. North East Music Information Special Interest Group (NEMISIG). Boston, MA, USA, 2013 (PDF).
Music
Bolsa, D., Nieto, O., La Bossa d'Urina: El Primer Disc, Published by Record Union. 2022 (Pandora, Spotify, Amazon).
Cobo, L. C., Cardona, J., Grant, E. E., Melendo, D., Nieto, O., Rumbahía: Casi al Compás, Published by CDBaby. 2021 (Pandora, Spotify, Amazon).
Cobo, L. C., Cardona, J., Gallegos, J., Melendo, D., Nieto, O., Rumbahía: Aprendiendo, Published by CDBaby. 2019 (Pandora, Spotify, Amazon).
Bolsa, D., Nieto, O., La Bossa d'Urina: Merda Fina, Published by Record Union. 2018 (Pandora, iTunes, Spotify, Amazon).
Henson, S., Nieto, O., Nuñez, J., Remas, E., Rickher, G., Arkaen: Arkaen, Published by Record Union. 2017 (Pandora, iTunes, Spotify, Amazon).Bolsa, D., Nieto, O., La Bossa d'Urina: La Bossa d'Urina, Published by Cydonia Records. 2015 (Pandora, iTunes, Spotify, Amazon).
Ferreiro, C., Llobet, J., Nieto, O., Prim, M., Sargon: Vida, Album edited by Weight Recordings. 2009 (Pandora, iTunes, Spotify, Amazon).
Ferreiro, C., Llobet, J., Nieto, O., Prim, M., Sargon: Transcriptions, Album edited by Big Bang Records. 2005.
Other
Won, M., Sanghyuk, C., Nieto, O., Serra, X., Automatic Music Tagging with Harmonic CNN. Late Breaking Session of the International Society for Music Information Retrieval Conference (ISMIR). Delft, The Netherlands, 2019 (PDF).
Nieto, O., Bello, J. P., MSAF: Music Structure Analytis Framework. International Society for Music Information Retrieval Conference (ISMIR). Málaga, Spain, 2015 (PDF).
Nieto, O., Smith, J. B. L., 2013 Late Break Session on Music Segmentation. Proc. of the 14th International Society for Music Information Retrieval Conference (ISMIR). Curitiba, Brazil, 2013 (PDF).
Rocha, B., Smith, J.B.L., Peeters, G., Ross, J.C., Nieto, O., Van Balen, J., Late-break Session on Music Structure Analysis. Proc. of the 13th International Society for Music Information Retrieval Conference (ISMIR). Porto, Portugal, 2012 (PDF).
Nieto, O., Pajuelo, A., López, D., Millan, A., Heredero, A., Duran, A., Herrero, J.R., Verdú, X., Becerra, Y., Morancho, E. Sistemas Operativos: Cuaderno de Laboratorio. Department of Computer Architecture. Polytechnic University of Catalonia. ISBN 978-84-612-1002-2. 2007.