Modeling strategies for speech enhancement in the latent space of a neural audio codec

This repository provides the official model checkpoints for the paper Modeling strategies for speech enhancement in the latent space of a neural audio codec authored by Sofiene Kammoun, Xavier Alameda-Pineda, and Simon Leglaive, and published at IEEE ICASSP 2026.

We explore different modeling strategies (autoregressive vs. non-autoregressive) and representation spaces (discrete vs. continuous) for speech enhancement using neural audio codecs and Conformer-based architectures.

arXiv | Code and Audio examples | Bibtex

Overview

Our work introduces and compares a family of speech enhancement models that systematically vary along two main axes:

Representation Type
Discrete tokens
Continuous latent vectors
Modeling Strategy
Autoregressive (AR): Sequential prediction of clean speech representation
Non-Autoregressive (NAR): Parallel prediction of clean speech representation

The current release includes the following models:

Model Name	Modeling Strategy	Input Representation	Output Representation	Model Checkpoint
D-AR	Autoregressive	Discrete	Discrete	`D-AR_ckpt_300.pt`
D-NAR	Non-Autoregressive	Discrete	Discrete	`D-NAR_ckpt_300.pt`
D-NAR*	Non-Autoregressive	Continuous	Discrete	`D-NAR_star_ckpt_300.pt`
C-AR	Autoregressive	Continuous	Continuous	`C-AR_ckpt_300.pt`
C-NAR	Non-Autoregressive	Continuous	Continuous	`C-NAR_ckpt_300.pt`

Additional models:

C-FT (C-FT-encoder_ckpt_300.pt) and D-FT (D-FT-encoder_ckpt_300.pt), where we only finetune the NAC's encoder with an MSE loss and a cross-entropy loss, respectively.
STFT-NAR (STFT_NAR_Mask_ckpt_300.pt), where instead of the embeddings of the NAC, we work with STFT representations, and we train the model to output an STFT mask.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train SofieneK/SE-NAC

Paper for SofieneK/SE-NAC

Modeling strategies for speech enhancement in the latent space of a neural audio codec

Paper • 2510.26299 • Published Mar 11