Immersive Audio, Video and AI-Enhanced Projection

Exploring immersive multimedia, AI-driven techniques, and audience sensory engagement.

DALL·E 2024-12-08 13.34.36 - A minimalistic and academic-style depiction of immersive mult

My work with immersive video at UNT’s Center for Experimental Music and Intermedia (CEMI) centers on creating multi-sensory compositions that combine spatial audio and 270-degree visual projections. These projects often utilize deep-learning algorithms to generate or manipulate visual content, synchronized with audio spatialization for a unified sensory experience. Serving as a TA for iARTA and the Hybrid Arts Lab, I have developed technical expertise in VR and immersive projection environments, crafting compositions that probe the interactions between audio, video, and AI-driven processes. This work allows me to experiment with AI-enhanced visual and auditory modalities and examine how they shape audience perception in multimedia installations.

Two recent pieces exemplify this research approach. foam (2024), for 44-channel audio and 270-degree projection, was created using hundreds of iterations of prompt-engineered audio synthesis. The audio component was generated through Udio, followed by generative audio fill functions, and meticulously edited in Ableton Live to achieve a cohesive sonic landscape. For the video, I used DALL-E-generated images, which I AI-upscaled and then stitched together. I animated these images using non-generative tools within Adobe Creative Suite, allowing me to maintain greater control over the visual narrative and cohesion with the audio.

The second piece, Understanding (2024), is a deeply personal exploration of language, belonging, and displacement. This work combines audio sourced from cell-phone recordings provided by twelve non-citizen residents of the U.S., each recording themselves speaking fifty words related to displacement and belonging. These words, selected in collaboration with visual artist Veronica Ibarguengoitia Tena, were then edited into 600 individual audio clips. To facilitate dynamic, real-time control, I placed each clip in a separate slot in Ableton Live and developed a custom Python score in Jupyter Lab using the AbletonOSC library—a project to which I am a contributing developer. This score triggers audio clips algorithmically, creating a constantly shifting soundscape that reflects the complexity of linguistic integration.

The video component of Understanding was designed in TouchDesigner, where I used Python scripts in CHOP Executes to create algorithmic processes controlling the density and narrative pacing of the text displayed across the 270-degree projection space. Each word is animated by an ASCII character fascia, symbolizing the challenges of integrating into a new linguistic and cultural landscape. Together, the audio and visuals convey the nuanced and often fragmented experience of people adapting to a new environment, capturing the emotional layers of displacement and belonging.

These immersive compositions underscore my commitment to integrating advanced AI tools and multimedia techniques in ways that deepen the conceptual framework of each piece. By leveraging generative technologies alongside traditional compositional processes, I aim to create installations that invite audiences into complex, interactive environments, pushing the boundaries of what is possible in audio-visual art.