AI-Driven Audio Synthesis with Recurrent Neural Networks

Exploring AI-driven audio synthesis: RNN innovation for computer music composition.

DALL·E 2024-12-08 14.03.14 - A grayscale academic-style representation of AI-driven audio

One of the core aspects of my research is developing AI-driven audio synthesis using recurrent neural networks (RNNs). In collaboration with Dr. Marco Buongiorno Nardelli, I am preparing to train a new RNN model for audio synthesis based on the SampleRNN framework. We are enhancing this model to support multi-threaded processing, addressing the limitations of the current SampleRNN architecture in handling large training datasets. This project aims to produce more nuanced and complex audio outputs suitable for real-time and generative music applications. Additionally, we are exploring advanced tuning techniques to optimize the model’s musicality, creating tools that will enable composers to generate high-quality audio with computational assistance. Our research has the potential to set new standards in AI-assisted composition, contributing to both academic inquiry and practical applications in music technology.

I use these techniques in my concert-length program Traces of the Imaginary, of which my work Our Little Ones’ Feet is a part. This program, a collaboration with Dr. Buongiorno Nardelli, premiered at the Currents Festival in Santa Fe, co-sponsored by the Santa Fe Institute, in June 2024. The program is structured around J.S. Bach’s Cello Suite No. 2 in D minor, with works for cello and electronics interwoven between each movement to create a dialogue between historical repertoire and contemporary technological practices.

Two pieces on this program incorporate generative audio synthesis using SampleRNN, a recurrent neural network model that Dr. Buongiorno Nardelli and I trained on three hours of my own cello recordings. These recordings, captured in my home studio while learning the music for Traces, served as the foundation for the model’s training data. While the synthesis produced fascinating results, it was also full of hallucination—unexpected and often unusable sonic artifacts. Only about 5% of the generated material was musically viable, highlighting both the potential and the current limitations of SampleRNN for creative applications.

Building on these initial experiments, I aim to develop a new proprietary model inspired by SampleRNN but with a deeper architecture that can handle larger datasets and generate higher-quality audio output. This refined model would not only improve the usability of AI-generated audio but also open new avenues for composers to integrate machine learning into their creative processes. By refining these tools, I hope to set new standards for generative audio synthesis and contribute to both the academic and practical applications of AI in music composition.