Cinematic Audio Source Separation
Using Visual Cues

CVPR 2026
*Indicates equal contribution, +Indicates corresponding authors

Abstract

Cinematic Audio Source Separation (CASS) aims to decompose mixed film audio into speech, music, and sound effects, supporting applications like dubbing and remastering. Existing CASS approaches are audio-only, overlooking the inherently audio-visual nature of film, where sounds often align with visual cues. We present the first framework for audio-visual CASS (AV-CASS), leveraging visual context to enhance separation. Our method formulates CASS as a conditional generative modeling problem using conditional flow matching, enabling multimodal audio source separation. To address the lack of paired cinematic datasets with isolated sound sources, we introduce a training data synthesis pipeline that pairs in-the-wild audio and video streams (e.g., facial videos for speech, scene videos for effects) and develop a dedicated visual encoder for this dual-stream setup. Trained on synthetic data, our model generalizes effectively to real-world cinematic content and achieves strong performance on synthetic, real-world, and audio-only CASS benchmarks.

Interactive Mixer Demo with AV-CASS

Explore the audio mixer demo with AV-CASS for each sample:


Separation Results

i

Use the toggle below to switch views: Comparison with CASS models compares AV-CASS against CASS models (BandIt [1] and MRX [2]), while Comparison with DAVIS-Flow compares AV-CASS against the audio-visual source separation model DAVIS-Flow [3].

Superman | Official Trailer (2025)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

My Oxford Year | Official Trailer | Netflix (2025)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Glass Onion: A Knives Out Mystery | Official Trailer (2022)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Knives Out | Official Trailer (2019)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Interstellar | Official Trailer (2014)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Mission: Impossible - Rogue Nation (2015)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Darkest Hour (2017)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

American Made (2017)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

The Divergent Series: Allegiant (2016)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

The Dark Tower (2017)

Input Video
AV-CASS (Ours) - Speech
AV-CASS(Ours) - SFX
AV-CASS(Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Breaking In (2018)

Input Video
AV-CASS (Ours) - Speech
AV-CASS(Ours) - SFX
AV-CASS(Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Teen Wolf Too (1987)

Input Video
AV-CASS (Ours) - Speech
AV-CASS(Ours) - SFX
AV-CASS(Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

MADAGASCAR | Official Trailer (2010)

Input Video (Animation Movie)
AV-CASS (Ours) - Speech
AV-CASS(Ours) - SFX
AV-CASS(Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Materialists | Official Trailer HD | A24 (2025)

Input Video
AV-CASS (Ours) - Speech
AV-CASS(Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

WONKA | Official Trailer (2023)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Good Fortune | Official Trailer (2025)

Input Video
AV-CASS (Ours) - Speech
AV-CASS(Ours) - SFX
AV-CASS(Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Good Fortune | Official Trailer (2025)

Input Video
AV-CASS (Ours) - Speech
AV-CASS(Ours) - SFX
AV-CASS(Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

DON'T LOOK UP | Official Trailer (2021)

Input Video
AV-CASS (Ours) - Speech
AV-CASS(Ours) - SFX
AV-CASS(Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Slow Horses | Official Trailer (2022)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Slow Horses | Official Trailer (2022)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

The Martian | Official Trailer (2015)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Inception (2010)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Knowing (2009)

Input Video (No speech in this clip)
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Lassie Come Home (1943)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Bandslam (2009)

Input Video (No music in this clip)
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Inglourious Basterds (2009)

Input Video (No music in this clip)
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

The Divergent Series: Allegiant (2016)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

San Andreas (2015)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

Jurassic World: Fallen Kingdom (2018)

Input Video
AV-CASS (Ours) - Speech
AV-CASS (Ours) - SFX
AV-CASS (Ours) - Music
BandIt - Speech
BandIt - SFX
BandIt - Music
MRX - Speech
MRX - SFX
MRX - Music

BibTeX

@inproceedings{zhang2026cinematic,
        title={Cinematic Audio Source Separation Using Visual Cues},
        author={Zhang, Kang and Lee, Suyeon and Senocak, Arda and Chung, Joon Son},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        year={2026}
      }

References

  1. K. N. Watcharasupat et al. A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation. IEEE Open Journal of Signal Processing, 2023.
  2. D. Petermann, G. Wichern, Z. Q. Wang, & J. Le Roux. The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks. ICASSP, 2022.
  3. C. Huang, S. Liang, Y. Tian, A. Kumar, C. Xu. High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling. IJCV, 2025.